Senior Data Engineer

dbt Labs dbt Labs · Data AI · United States · Remote · Data

Senior Data Engineer at dbt Labs responsible for building and maintaining the core data platform infrastructure that powers internal analytics and data products. This includes owning the data lakehouse architecture, transforming and serving data, establishing data contracts, driving platform architecture decisions, enhancing observability, and partnering cross-functionally. The role also involves dogfooding new data infrastructure and AI technology.

What you'd actually do

  1. Own the architecture and operations of our data lakehouse, including object storage, table formats, maintenance, and query engine integrations
  2. Build and maintain the infrastructure layer that transforms and serves data reliably at scale—from raw landing zones through to curated, queryable datasets
  3. Partner with product engineering to establish data contracts and schema standards around event telemetry, ensuring data arrives in the lakehouse in a form that's reliable and ready for downstream use
  4. Drive decisions on data platform architecture, tooling, and engineering best practices across storage, compute, and access layers
  5. Enhance observability and monitoring of data infrastructure, including pipeline reliability, data freshness, and system performance

Skills

Required

  • SQL
  • Python
  • Data lakehouse architecture
  • Data contracts
  • Schema standards
  • Orchestration tools (Airflow, Dagster, Prefect)
  • Cloud infrastructure tooling (Terraform, Helm, Kubernetes)
  • Apache Spark
  • Data engineering

Nice to have

  • dbt
  • Apache Iceberg
  • SaaS or high-growth tech environment

What the JD emphasized

  • Expert-level SQL and Python skills
  • 5+ years of experience as a data engineer, and 8+ years of total experience in software engineering (including data engineering roles)
  • Strong knowledge of data lakehouse architecture, including storage layer design, table formats, and compute/query engine integration
  • Experience defining and enforcing data contracts or schema standards in collaboration with upstream engineering teams
  • Hands-on experience with modern orchestration tools like Airflow, Dagster, or Prefect
  • Working knowledge of cloud infrastructure tooling, including Terraform, Helm, and Kubernetes
  • Hands-on experience running Apache Spark in production, including job tuning, cluster sizing, and managing failures at scale
  • A bias for action—able to stay focused and prioritize effectively in an ambiguous environment