Data Engineer, Scaling Analytics

OpenAI OpenAI · AI Frontier · San Francisco, CA · Scaling

Data Engineer role focused on building and scaling analytical foundations for OpenAI's AI infrastructure organization, supporting operations, capacity planning, and supply chain through data pipelines, models, and reporting systems.

What you'd actually do

  1. Design, build, and maintain scalable data pipelines supporting infrastructure deployment, operations, capacity planning, and supply chain functions.
  2. Develop trusted datasets and reporting systems that provide visibility into hardware inventory, deployment status, site readiness, capacity utilization, and operational performance.
  3. Partner with cross-functional stakeholders to define metrics, establish data standards, and improve decision-making across infrastructure organizations.
  4. Create scalable data models that enable consistent reporting and analytics across multiple data sources and operational systems.
  5. Improve data quality, lineage, observability, and governance practices across critical infrastructure datasets.

Skills

Required

  • SQL
  • Python
  • Data modeling
  • ETL/ELT
  • Data warehousing
  • Orchestration frameworks
  • Data quality
  • Observability

Nice to have

  • Infrastructure operations support
  • Hardware operations support
  • Supply chain support
  • Manufacturing support
  • Logistics support
  • Capacity planning support
  • Large-scale operational telemetry
  • Business-critical reporting
  • Spark
  • dbt
  • Executive reporting
  • Fast-paced environments
  • Ambiguous environments

What the JD emphasized

  • 5+ years of experience building and maintaining production data pipelines and analytical systems
  • Strong proficiency in SQL and experience designing scalable data models
  • Proficiency in Python or another programming language commonly used for data engineering
  • Experience working with modern data warehouses (e.g., Snowflake, BigQuery, Redshift) and orchestration frameworks (e.g., Airflow, Dagster)
  • Experience designing reliable ETL/ELT workflows with a focus on maintainability, performance, and operational excellence
  • Experience partnering with cross-functional stakeholders to translate business requirements into technical solutions
  • Experience implementing data quality checks, monitoring, and observability practices in production environments