Software Engineer, Accelerators

OpenAI OpenAI · AI Frontier · San Francisco, CA · Scaling

Software Engineer focused on optimizing low-level software for AI accelerators to improve efficiency and performance for large-scale training and inference of AI models, including LLMs and recommender systems.

What you'd actually do

  1. Prototype and enable OpenAI's AI software stack on new, exploratory accelerator platforms.
  2. Optimize large-scale model performance (LLMs, recommender systems, distributed AI workloads) for diverse hardware environments.
  3. Develop kernels, sharding mechanisms, and system scaling strategies tailored to emerging accelerators.
  4. Collaborate on optimizations at the model code level (e.g. PyTorch) and below to enhance performance on non-traditional hardware.
  5. Perform system-level performance modeling, debug bottlenecks, and drive end-to-end optimization.

Skills

Required

  • 3+ years of experience working on AI infrastructure, including kernels, systems, or hardware-software co-design
  • Hands-on experience with accelerator platforms for AI at data center scale (e.g., TPUs, custom silicon, exploratory architectures).
  • Strong understanding of kernels, sharding, runtime systems, or distributed scaling techniques.
  • Familiarity with optimizing LLMs, CNNs, or recommender models for hardware efficiency.
  • Experience with performance modeling, system debugging, and software stack adaptation for novel architectures.
  • Ability to operate across multiple levels of the stack, rapidly prototype solutions, and navigate ambiguity in early hardware bring-up phases

Nice to have

  • Exposure to mobile accelerators is welcome, but experience enabling data center-scale AI hardware is preferred.
  • Interest in shaping the future of AI compute through exploration of alternatives to mainstream accelerators.

What the JD emphasized

  • low-level software
  • large-scale training and inference
  • new compute platforms
  • performance optimizations
  • kernels
  • sharding strategies
  • runtime improvements
  • non-traditional hardware

Other signals

  • accelerates AI research
  • large-scale training and inference
  • new compute platforms
  • performance optimizations