Software Engineer, Inference – Amd GPU Enablement

OpenAI OpenAI · AI Frontier · San Francisco, CA · Scaling

Software Engineer focused on scaling and optimizing OpenAI's inference infrastructure on emerging GPU platforms, specifically AMD accelerators. The role involves working across the stack from low-level kernel performance to high-level distributed execution, integrating internal model-serving infrastructure, debugging distributed inference workloads, and collaborating on high-performance GPU kernels and communication libraries.

What you'd actually do

  1. Own bring-up, correctness and performance of the OpenAI inference stack on AMD hardware.
  2. Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into a variety of GPU-backed systems.
  3. Debug and optimize distributed inference workloads across memory, network, and compute layers.
  4. Validate correctness, performance, and scalability of model execution on large GPU clusters.
  5. Collaborate with partner teams to design and optimize high-performance GPU kernels for accelerators using HIP, Triton, or other performance-focused frameworks.

Skills

Required

  • Experience writing or porting GPU kernels using HIP, CUDA, or Triton
  • Familiarity with communication libraries like NCCL/RCCL
  • Experience with distributed inference systems
  • Experience scaling models across fleets of accelerators
  • Experience solving end-to-end performance challenges across hardware, system libraries, and orchestration layers

Nice to have

  • Contributions to open-source libraries like RCCL, Triton, or vLLM
  • Experience with GPU performance tools (Nsight, rocprof, perf) and memory/comms profiling
  • Prior experience deploying inference on other non-NVIDIA GPU environments
  • Knowledge of model/tensor parallelism, mixed precision, and serving 10B+ parameter models

What the JD emphasized

  • AMD GPU Enablement
  • AMD accelerators
  • HIP
  • Triton
  • RCCL
  • low-level kernel performance
  • distributed inference workloads
  • high-performance GPU kernels

Other signals

  • scaling inference infrastructure
  • optimizing model inference
  • AMD GPU Enablement