Data Engineer

OpenAI OpenAI · AI Frontier · San Francisco, CA · Applied AI

OpenAI is seeking a Data Engineer to build and manage data pipelines for user event data integration into their data warehouse. This role will develop canonical datasets for key product metrics, support safety systems, and collaborate with research teams to aid in new model training. The position requires expertise in data engineering technologies and distributed processing frameworks.

What you'd actually do

  1. Design, build and manage our data pipelines, ensuring all user event data is seamlessly integrated into our data warehouse.
  2. Develop canonical datasets to track key product metrics including user growth, engagement, and revenue.
  3. Work collaboratively with various teams, including, Infrastructure, Data Science, Product, Marketing, Finance, and Research to understand their data needs and provide solutions.
  4. Implement robust and fault-tolerant systems for data ingestion and processing.
  5. Participate in data architecture and engineering decisions, bringing your strong experience and knowledge to bear.

Skills

Required

  • Python
  • Scala
  • Java
  • Hadoop
  • Flink
  • HDFS
  • S3
  • Airflow
  • Dagster
  • Prefect
  • Spark

What the JD emphasized

  • 3+ years of experience as a data engineer
  • 8+ years of any software engineering experience

Other signals

  • building our data pipelines
  • powering analyses, safety systems
  • collaborate closely with the researchers
  • help them train new models