Senior Software Engineer, Data Infrastructure

Decagon Decagon · Vertical AI · San Francisco, CA · Engineering

Senior Software Engineer, Data Infrastructure role focused on building and operating data systems that power AI products. This involves designing and implementing high-throughput data pipelines, streaming systems, and analytical data layers, with a strong emphasis on reliability, performance, and scalability. The role partners with research and product teams to architect data solutions and optimize data paths for low latency.

What you'd actually do

  1. Design and implement high‑throughput data pipelines and streaming systems with strong SLOs, clear runbooks, and actionable telemetry.
  2. Build and operate real‑time and batch ingestion infrastructure using tools like Kafka, Flink, and Airflow.
  3. Own our analytical data layer — schema design, query performance, and cost optimization across ClickHouse, BigQuery, or similar.
  4. Partner with research and product teams to architect data solutions, evaluate performance, and scale new features.
  5. Tune pipeline and query latencies: optimize data paths, apply smart caching/partitioning, and hit tight p95/p99 targets.

Skills

Required

  • production data infrastructure
  • data pipelines
  • streaming systems
  • Kafka
  • Flink
  • Airflow
  • ClickHouse
  • BigQuery
  • dbt
  • observability
  • OpenTelemetry
  • Prometheus
  • Grafana
  • Datadog
  • incident response
  • Terraform
  • GitOps

Nice to have

  • CDC tooling
  • Debezium
  • orchestration frameworks
  • Dagster
  • Prefect
  • Spark
  • Dask
  • cloud data warehouses
  • Snowflake
  • Redshift
  • Databricks
  • early data/platform/infrastructure engineer
  • Kubernetes
  • GKE
  • EKS
  • AKS
  • multi-cloud
  • GCP
  • AWS
  • Azure
  • customer-managed deployments

What the JD emphasized

  • 5+ years building and operating production data infrastructure at scale
  • Hands-on experience with Tier 1 data technologies: ClickHouse, Kafka (or MSK/Pub‑Sub/RabbitMQ), and Flink or dbt
  • Proven track record meeting high availability and low latency targets across streaming and batch workloads
  • Excellent observability chops (OpenTelemetry, Prometheus/Grafana, Datadog) and strong incident response discipline

Other signals

  • design and build data systems for AI products
  • own critical data pipelines and storage layers
  • improve reliability and performance of data systems
  • create paved paths for engineers to work with data at scale
  • partner with research and product teams to architect data solutions