Networking Operating System Firmware Engineer

OpenAI OpenAI · AI Frontier · San Francisco, CA · Scaling

This role focuses on building and maintaining the networking operating system firmware for AI supercomputers. The engineer will work with SONiC NOS images, Linux kernel components, switch ASIC SDKs, and platform drivers to ensure the performance, reliability, and integration of the switching layer in large-scale AI infrastructure. Responsibilities include designing, developing, integrating, and debugging custom NOS images, bringing up new switch platforms, and collaborating with hardware and ASIC partners. The role also involves developing CI/build pipelines and supporting factory bring-up and deployment.

What you'd actually do

  1. Design, develop, and maintain custom SONiC NOS images for large-scale bleeding-edge AI fabrics.
  2. Integrate and configure Linux kernel components, device drivers, switch ASIC SDKs, and SAI layers.
  3. Bring up new switch platforms (thermal/fan control, power monitoring, transceiver management, watchdogs, OSFP CMIS, LEDs, CPLDs, etc.).
  4. Extend and customize SONiC services for routing, telemetry, control-plane state, and distributed automation.
  5. Work with hardware teams to validate ASIC configurations, link bring-up, SerDes tuning, buffer profiles, and performance baselines.

Skills

Required

  • SONiC or comparable NOS stacks (FBOSS, Cumulus Linux, Arista EOS, Junos PFE-level integration, etc.)
  • Linux kernel
  • network device drivers
  • low-level OS internals
  • Broadcom / Marvell / NVIDIA / Intel ASIC SDKs and SAI implementations
  • C
  • C++
  • Python
  • L2/L3 forwarding
  • ECMP
  • RoCE
  • BGP
  • QoS
  • PFC
  • buffer tuning
  • telemetry
  • hardware platform bring-up
  • board-level debugging
  • CI/CD pipelines
  • distributed config/state management
  • large-scale automation
  • cross-functional problem solving

Nice to have

  • Rust
  • Go
  • OpenConfig gNMI interfaces
  • YANG data models
  • lead teams to deliver a project end to end

What the JD emphasized

  • custom SONiC NOS images
  • Linux kernel
  • switch ASIC SDKs
  • platform drivers
  • fleet automation
  • ASIC configurations
  • switch silicon SDK releases
  • platform requirements
  • complex issues spanning kernel, platform drivers, SONiC dockers, routing agents, orchestration services, hardware signals, and network topology
  • fleet-wide monitoring
  • automated lifecycle workflows
  • reproducible NOS builds
  • mass deployment
  • novel networking protocols and technologies
  • AI factory scale