Tl, Research Inference

OpenAI OpenAI · AI Frontier · San Francisco, CA · Research

This role focuses on building and optimizing high-performance inference systems for large-scale AI models, translating research ideas into efficient and scalable inference infrastructure. It involves owning core execution paths, distributed inference across multiple GPUs, and optimizing operators and kernels, with a strong emphasis on performance, correctness, and realism for research enablement.

What you'd actually do

  1. Design and build high-performance inference runtimes for large-scale AI models, with a focus on efficiency, reliability, and scalability.
  2. Own and optimize core execution paths, including model execution, memory management, batching, and scheduling.
  3. Develop and improve distributed inference across multiple GPUs, including parallelism strategies, communication patterns, and runtime coordination.
  4. Implement and optimize inference-critical operators and kernels informed by real-world workloads.
  5. Partner closely with research teams to ensure new model architectures are supported accurately and efficiently in inference systems.

Skills

Required

  • building production inference systems
  • GPU-centric performance engineering
  • multi-GPU or distributed systems
  • reasoning about inference pipelines
  • implementing research ideas within system constraints
  • solving complex systems problems at scale

Nice to have

  • hands-on technical ownership and execution

What the JD emphasized

  • production inference systems
  • GPU-centric performance engineering
  • multi-GPU or distributed systems
  • inference pipelines
  • research ideas and implement them within real system and performance constraints
  • hard, ambiguous systems problems that only emerge at scale

Other signals

  • inference systems
  • large-scale AI models
  • performance, memory, and scalability tradeoffs
  • high-performance inference infrastructure