What you'd actually do

Design and operate large-scale batch and streaming data pipelines that directly power Perplexity product features, AI training and evaluation workflows, analytics, and experimentation.

Build event-driven and streaming systems (Kafka, Kinesis, PubSub, or similar) for real-time ingestion, transformation, and delivery, alongside batch frameworks for backfills, aggregations, and offline computation.

Lead the architecture of data orchestration using tools like Airflow or Dagster, owning scheduling, dependency management, retries, SLAs, and end-to-end observability for critical data flows.

Set and enforce guarantees for data correctness, freshness, lineage, and recoverability, designing systems that handle rapid scale growth, partial failures, and evolving schemas without disrupting AI workloads or product experiences.

Build self-serve data platforms that let engineers, data scientists, and analysts safely discover data, define contracts, and create and operate their own pipelines with minimal friction.

Skills

Required

5+ years (Senior) or 8+ years (Staff) of software engineering experience.
Strong experience building production data infrastructure systems.
Hands-on experience with batch and/or streaming data processing at scale.
Deep familiarity with data orchestration systems (Airflow, Dagster, or similar).
Proficiency in Python and at least one additional backend language (Go, TypeScript, etc.).
Strong systems thinking around reliability, latency, cost, and complexity tradeoffs.
Experience supporting ML/AI workflows, training pipelines, or evaluation systems.
Familiarity with data quality, lineage, observability, and governance tooling.
Prior ownership of internal platforms used by many teams.

About Perplexity AI

Perplexity is an AI-powered answer engine built to serve the world’s curiosity with fast, trustworthy answers grounded in the live web and backed by clear citations. It combines multiple leading models with real-time search to synthesize up-to-date, source-linked responses instead of traditional search results. On top of this foundation, Perplexity is rolling out Computer, a general-purpose AI worker that can use software like a human to research, build, and execute end-to-end workflows for users.

About the Role

The Data Platform team owns the end-to-end data lifecycle at Perplexity, from ingestion through processing, storage, and serving, powering product features, analytics, experimentation, AI workloads, and the company’s data lake.

The team defines the architecture for batch and streaming systems, the orchestration and observability stack, and a self-serve data platform, while thoughtfully combining platforms such as Databricks and Snowflake with open-source technologies including Spark, Kafka, Flink, Airflow, Dagster, dbt, Iceberg, Delta Lake, and ClickHouse.

In this senior/staff role, you will shape architecture, set standards, and drive the long-term technical direction of Perplexity’s data ecosystem.

Key Responsibilities

Design and operate large-scale batch and streaming data pipelines that directly power Perplexity product features, AI training and evaluation workflows, analytics, and experimentation.
Build event-driven and streaming systems (Kafka, Kinesis, PubSub, or similar) for real-time ingestion, transformation, and delivery, alongside batch frameworks for backfills, aggregations, and offline computation.
Lead the architecture of data orchestration using tools like Airflow or Dagster, owning scheduling, dependency management, retries, SLAs, and end-to-end observability for critical data flows.
Set and enforce guarantees for data correctness, freshness, lineage, and recoverability, designing systems that handle rapid scale growth, partial failures, and evolving schemas without disrupting AI workloads or product experiences.
Build self-serve data platforms that let engineers, data scientists, and analysts safely discover data, define contracts, and create and operate their own pipelines with minimal friction.
Improve developer experience through better abstractions, opinionated paved paths, and standards for data modeling, testing, validation, and deployment, treating the data platform as a product used by many teams.
Drive architectural decisions across storage, compute, orchestration, and data APIs, partnering closely with product engineering and data science to align the data ecosystem with Perplexity’s roadmap.
Mentor engineers, review designs, and raise the technical bar for data infrastructure through thoughtful feedback, documentation, and hands-on collaboration.

Qualifications

5+ years (Senior) or 8+ years (Staff) of software engineering experience.
Strong experience building production data infrastructure systems.
Hands-on experience with batch and/or streaming data processing at scale.
Deep familiarity with data orchestration systems (Airflow, Dagster, or similar).
Proficiency in Python and at least one additional backend language (Go, TypeScript, etc.).
Strong systems thinking around reliability, latency, cost, and complexity tradeoffs.
Experience supporting ML/AI workflows, training pipelines, or evaluation systems.
Familiarity with data quality, lineage, observability, and governance tooling.
Prior ownership of internal platforms used by many teams.

If you’re excited about this role, we encourage you to apply even if your experience doesn’t match every qualification listed above.

About Perplexity AI

About the Role

In this senior/staff role, you will shape architecture, set standards, and drive the long-term technical direction of Perplexity’s data ecosystem.

Key Responsibilities

Design and operate large-scale batch and streaming data pipelines that directly power Perplexity product features, AI training and evaluation workflows, analytics, and experimentation.

Lead the architecture of data orchestration using tools like Airflow or Dagster, owning scheduling, dependency management, retries, SLAs, and end-to-end observability for critical data flows.

Build self-serve data platforms that let engineers, data scientists, and analysts safely discover data, define contracts, and create and operate their own pipelines with minimal friction.

Improve developer experience through better abstractions, opinionated paved paths, and standards for data modeling, testing, validation, and deployment, treating the data platform as a product used by many teams.

Drive architectural decisions across storage, compute, orchestration, and data APIs, partnering closely with product engineering and data science to align the data ecosystem with Perplexity’s roadmap.

Mentor engineers, review designs, and raise the technical bar for data infrastructure through thoughtful feedback, documentation, and hands-on collaboration.

Qualifications

5+ years (Senior) or 8+ years (Staff) of software engineering experience.

Strong experience building production data infrastructure systems.

Hands-on experience with batch and/or streaming data processing at scale.

Deep familiarity with data orchestration systems (Airflow, Dagster, or similar).

Proficiency in Python and at least one additional backend language (Go, TypeScript, etc.).

Strong systems thinking around reliability, latency, cost, and complexity tradeoffs.

Experience supporting ML/AI workflows, training pipelines, or evaluation systems.

Familiarity with data quality, lineage, observability, and governance tooling.

Prior ownership of internal platforms used by many teams.

If you’re excited about this role, we encourage you to apply even if your experience doesn’t match every qualification listed above.

Member of Technical Staff (software Engineer, Data Platform)

What you'd actually do

Skills

Required

What the JD emphasized

Other signals

About Perplexity AI

About the Role

Key Responsibilities

Qualifications

About Perplexity AI

About the Role

Key Responsibilities

Qualifications