Software Engineer, Data Infrastructure

OpenAI OpenAI · AI Frontier · San Francisco, CA · Applied AI

Software Engineer focused on building and operating data infrastructure that supports massive compute fleets and storage systems, designed for high performance and scalability. This role involves designing, building, and operating the next generation of data infrastructure at OpenAI, scaling and hardening big data compute and storage platforms, building and supporting high-throughput streaming systems, building and operating low latency data ingestions, enabling secure and governed data access for ML and analytics, and designing for reliability and performance at extreme scale.

What you'd actually do

  1. Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure, machine learning infrastructure while ensuring scalability, reliability, and security
  2. Ensure our data platform can scale by orders of magnitude while remaining reliable and efficient
  3. Accelerate company productivity by empowering your fellow engineers & teammates with excellent data tooling and systems
  4. Collaborate with product, research and analytics teams to build the technical foundations capabilities that unlock new features and experiences
  5. Own the reliability of the systems you build, including participation in an on-call rotation for critical incidents

Skills

Required

  • 4+ years in data infrastructure engineering OR 4+ years in infrastructure engineering with a strong interest in data
  • building and operating scalable, reliable, secure systems
  • ambiguity and rapid change
  • learning and filling in missing skills
  • sharing learnings clearly and concisely

What the JD emphasized

  • massive compute fleets
  • exabyte-scale architecture
  • high throughput streaming platforms
  • low latency data ingestions
  • extreme scale
  • full lifecycle ownership
  • production operations
  • on-call participation
  • Spark
  • Kafka
  • Flink
  • Airflow
  • Trino
  • Iceberg
  • Terraform
  • large-scale distributed systems
  • data infrastructure problems in the AI space

Other signals

  • data infrastructure
  • ML feature engineering
  • AI assisted data workflows