Inference Technical Lead, Sora

OpenAI OpenAI · AI Frontier · San Francisco, CA · Research

OpenAI is seeking a GPU Inference Engineer to optimize model serving efficiency, inference performance, and scalability for their multimodal foundation models, Sora. The role involves kernel-level systems, data movement, and low-level performance tuning to support the growth and reliability of AI systems.

What you'd actually do

  1. Perform engineering efforts focused on improving model serving, inference performance, and system efficiency
  2. Drive optimizations from a kernel and data movement perspective to improve system throughput and reliability
  3. Partner closely with research and product teams to ensure our models perform effectively at scale
  4. Design, build, and improve critical serving infrastructure to support Sora’s growth and reliability needs

Skills

Required

  • deep expertise in model performance optimization, particularly at the inference layer
  • strong background in kernel-level systems, data movement, and low-level performance tuning
  • scaling high-performing AI systems that serve real-world, multimodal workloads
  • navigate ambiguity, set technical direction, and drive complex initiatives to completion

Nice to have

  • multimodal capabilities

What the JD emphasized

  • critical to scaling the team’s broader goals
  • directly enable leadership to focus on higher-leverage initiatives by building a stronger technical foundation

Other signals

  • GPU Inference Engineer
  • model serving efficiency
  • inference performance and scalability
  • kernel-level systems
  • low-level performance tuning