Host Systems Software Engineer

OpenAI OpenAI · AI Frontier · San Francisco, CA · Scaling

OpenAI is seeking an experienced systems software engineer to design and build the host software stack for their custom next-generation AI systems. This role involves working closely with hardware on performance-critical software, including Linux kernel drivers, high-throughput I/O paths, and system-scale networking. Responsibilities include platform bring-up, debugging, performance optimization, and developing software for PCIe, DMA, NICs, and accelerators. The role requires strong C/C++ and Linux systems fundamentals, with experience in areas like kernel drivers, networking, or RDMA.

What you'd actually do

  1. Design, implement, and debug host-side systems software for AI infrastructure, including Linux kernel drivers and supporting userspace components.
  2. Build and optimize software paths for high-throughput, low-latency communication, including RDMA and related networking functionality.
  3. Develop software around PCIe, DMA, NICs, accelerators, memory movement, and device interaction.
  4. Bring up new hardware platforms and diagnose complex issues across kernel, firmware, networking, and hardware boundaries.
  5. Build tooling for integration, testing, diagnostics, observability, qualification, and performance characterization.

Skills

Required

  • C/C++
  • Python
  • Linux tooling
  • Linux systems fundamentals
  • debugging
  • PCIe
  • DMA
  • NICs
  • accelerators
  • memory movement
  • device interaction
  • RDMA
  • networking

Nice to have

  • Rust
  • RoCE
  • ibverbs
  • kernel networking
  • congestion-control concepts
  • ECN
  • DCQCN
  • peer-to-peer communication
  • SR-IOV
  • IOMMU
  • dma-buf
  • accelerator bring-up
  • NIC bring-up
  • SoC bring-up
  • custom hardware platform bring-up
  • profiling
  • high-throughput, low-latency systems optimization

What the JD emphasized

  • performance-critical software
  • low-latency communication
  • high-throughput
  • Linux kernel drivers
  • debug across hardware and software boundaries
  • performance optimization