Software Engineer, Productivity - Inference Runtime

OpenAI OpenAI · AI Frontier · San Francisco, CA · Scaling

Software Engineer focused on developer productivity for OpenAI's Inference Runtime teams. The role involves scaling engineering systems, safeguards, and developer workflows to ensure reliable, efficient, and safe model serving across various workloads. Key responsibilities include improving tooling and infrastructure for deploy gates, release validation, and observability to enhance model launch processes and inference optimizations.

What you'd actually do

  1. Improve systems that ensure inference engine releases are correct, performant, and regression-free by evolving tooling and infrastructure for deploy gate validation
  2. Bring rigor to release, validation, branching, and deployment processes across the inference stack
  3. Improve canary, async, and large-scale validation workflows for inference systems
  4. Harden CI, testing, and validation infrastructure so failures are actionable and trustworthy
  5. Reduce noisy or flaky failures caused by infrastructure instability, GPU scheduling, or test environment issues

Skills

Required

  • CI/CD systems
  • testing infrastructure
  • release tooling
  • developer productivity
  • large-scale build and validation systems
  • Python
  • debugging complex distributed systems
  • building automation

Nice to have

  • C++
  • inference experience

What the JD emphasized

  • high-ownership engineer
  • improving the tooling and infrastructure around deploy gates for inference engine images
  • systems that catch issues before they reach production
  • reduce noise from flaky or infrastructure-related test failures
  • improve automation around triage, ownership, debugging, and escalation when failures occur
  • improving observability, rollout safety, release automation, and developer self-service tooling
  • systems you build directly impact OpenAI’s ability to support new model launches, safely ship inference optimizations to the world, onboard new infrastructure providers, and operate one of the largest and most performance-sensitive inference platforms in the world
  • high-impact infrastructure where small regressions in correctness, latency, or reliability meaningfully affect production systems
  • building systems engineers can trust
  • strong instincts around developer productivity, testing, release engineering, and automation
  • deeply impactful inference environment
  • technically curious, comfortable navigating ambiguous, cross-functional operational problems
  • improve the reliability, safety, and developer experience of large-scale production infrastructure

Other signals

  • inference runtime
  • developer productivity
  • CI/CD infrastructure
  • release engineering
  • production readiness
  • inference systems reliability
  • model launches
  • inference optimizations
  • large-scale deployments
  • deploy gates
  • inference engine images
  • correctness
  • numerically sound
  • regressions
  • performant
  • time-to-first-token
  • time-between-tokens
  • observability
  • rollout safety
  • release automation
  • developer self-service tooling
  • large-scale inference platforms