Software Engineer, Inference - Multi Modal

OpenAI OpenAI · AI Frontier · San Francisco, CA · Scaling

Software Engineer focused on building and optimizing inference infrastructure for OpenAI's multimodal models (image, audio) at scale, ensuring high-throughput, low-latency delivery and enabling research to production workflows.

What you'd actually do

  1. Design and implement inference infrastructure for large-scale multimodal models.
  2. Optimize systems for high-throughput, low-latency delivery of image and audio inputs and outputs.
  3. Enable experimental research workflows to transition into reliable production services.
  4. Collaborate closely with researchers, infra teams, and product engineers to deploy state-of-the-art capabilities.
  5. Contribute to system-level improvements including GPU utilization, tensor parallelism, and hardware abstraction layers.

Skills

Required

  • experience building and scaling inference systems for LLMs or multimodal models
  • worked with GPU-based ML workloads and understand the performance dynamics of large models, especially with complex data like images or audio
  • comfortable dealing with systems that span networking, distributed compute, and high-throughput data handling
  • Own problems end-to-end

Nice to have

  • Experience working with image generation or audio synthesis models in production.
  • Exposure to distributed ML training or system-efficient model design.
  • familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems.

What the JD emphasized

  • serving OpenAI’s multimodal models at scale
  • high-throughput, low-latency delivery of image and audio inputs and outputs
  • GPU utilization

Other signals

  • serving multimodal models
  • high-throughput, low-latency delivery
  • GPU utilization