Software Engineer, Model Inference

OpenAI OpenAI · AI Frontier · San Francisco, CA · Scaling

Software Engineer focused on optimizing large AI models for high-volume, low-latency, and high-availability production and research environments. This role involves working with researchers and engineers to bring AI technologies into production, improving the inference stack's performance, and optimizing hardware utilization.

What you'd actually do

  1. Work alongside machine learning researchers, engineers, and product managers to bring our latest technologies into production.
  2. Work alongside researchers to enable advanced research through awesome engineering.
  3. Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our model inference stack.
  4. Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues.
  5. Optimize our code and fleet of Azure VMs to utilize every FLOP and every GB of GPU RAM of our hardware.

Skills

Required

  • modern ML architectures
  • optimize ML performance for inference
  • own problems end-to-end
  • PyTorch
  • NVidia GPUs
  • CUDA
  • HPC technologies (InfiniBand, MPI, NVLink)
  • architecting, building, observing, and debugging production distributed systems
  • rebuilding or substantially refactoring production systems

Nice to have

  • performance-critical distributed systems

What the JD emphasized

  • high-volume, low-latency, and high-availability
  • performance, latency, throughput, and efficiency
  • optimize their performance, particularly for inference
  • architecting, building, observing, and debugging production distributed systems
  • performance-critical distributed systems
  • rebuild or substantially refactor production systems several times over due to rapidly increasing scale

Other signals

  • optimize models for production
  • high-volume, low-latency, high-availability
  • performance, latency, throughput, efficiency
  • optimize code and fleet of Azure VMs
  • PyTorch, NVidia GPUs, CUDA, HPC