Software Engineer, Inference - Performance Optimization

OpenAI OpenAI · AI Frontier · San Francisco, CA · Scaling

Software Engineer focused on optimizing inference performance across application, model, and fleet layers. This role involves building performance models, analyzing inference workloads, enhancing tooling for bottleneck identification, and collaborating with teams to implement improvements and project future needs. The core of the role is to drive faster and cheaper inference.

What you'd actually do

  1. Build and refine performance models that translate microbenchmark results into cost-to-serve estimates.
  2. Analyze inference workloads end to end across applications, models, and fleet infrastructure.
  3. Enhance tooling to identify bottlenecks across layers for latency and throughput.
  4. Partner with other teams to turn performance insights into concrete improvements and project how future changes affect inference.

Skills

Required

  • performance modeling
  • cost-to-serve estimation
  • inference workload analysis
  • bottleneck identification
  • distributed systems
  • model inference
  • hardware efficiency
  • performance profiling
  • benchmarking
  • analysis
  • optimization

Nice to have

  • kernels
  • accelerators
  • networking
  • fleet scheduling

What the JD emphasized

  • performance optimization
  • inference performance
  • model inference
  • inference workloads
  • latency
  • capacity
  • utilization
  • cost tradeoffs
  • distributed systems
  • model inference
  • hardware efficiency
  • performance profiling
  • benchmarking
  • analysis
  • optimization

Other signals

  • inference performance optimization
  • cost-to-serve estimation
  • distributed systems
  • model inference