Member of Technical Staff (software Engineer, Data Platform)

Perplexity Perplexity · AI Frontier · San Francisco, CA · Platform & Infrastructure

Perplexity is seeking a Senior/Staff Software Engineer for their Data Platform team. This role will focus on designing, operating, and evolving large-scale batch and streaming data pipelines that support product features, AI training and evaluation, analytics, and experimentation. The engineer will lead architecture for orchestration, observability, and self-serve data platforms, ensuring data correctness, freshness, and recoverability for AI workloads. The role requires strong systems thinking, Python proficiency, and experience supporting ML/AI workflows.

What you'd actually do

  1. Design and operate large-scale batch and streaming data pipelines that directly power Perplexity product features, AI training and evaluation workflows, analytics, and experimentation.
  2. Build event-driven and streaming systems (Kafka, Kinesis, PubSub, or similar) for real-time ingestion, transformation, and delivery, alongside batch frameworks for backfills, aggregations, and offline computation.
  3. Lead the architecture of data orchestration using tools like Airflow or Dagster, owning scheduling, dependency management, retries, SLAs, and end-to-end observability for critical data flows.
  4. Set and enforce guarantees for data correctness, freshness, lineage, and recoverability, designing systems that handle rapid scale growth, partial failures, and evolving schemas without disrupting AI workloads or product experiences.
  5. Build self-serve data platforms that let engineers, data scientists, and analysts safely discover data, define contracts, and create and operate their own pipelines with minimal friction.

Skills

Required

  • 5+ years (Senior) or 8+ years (Staff) of software engineering experience.
  • Strong experience building production data infrastructure systems.
  • Hands-on experience with batch and/or streaming data processing at scale.
  • Deep familiarity with data orchestration systems (Airflow, Dagster, or similar).
  • Proficiency in Python and at least one additional backend language (Go, TypeScript, etc.).
  • Strong systems thinking around reliability, latency, cost, and complexity tradeoffs.
  • Experience supporting ML/AI workflows, training pipelines, or evaluation systems.
  • Familiarity with data quality, lineage, observability, and governance tooling.
  • Prior ownership of internal platforms used by many teams.

What the JD emphasized

  • AI training and evaluation workflows
  • ML workloads
  • data correctness, freshness, lineage, and recoverability
  • AI workloads
  • self-serve data platforms

Other signals

  • data platform for AI workloads
  • large-scale data pipelines
  • ML training and evaluation workflows