Principal Software Engineer, Compute Provisioning

Roblox Roblox · Consumer · San Mateo, CA · Software Engineering

Lead the systems that provision and rebuild Roblox's global fleet across bare metal and cloud, including new GPU and new AI infrastructure. Architect and extend MAPI, the unified Machine API that abstracts bare-metal, GPU hosts, and cloud instances behind a single global interface. Ship fleet-wide maintenance operations to hundreds of thousands of machines through MAPI. Drive best-in-class provisioning performance. Evaluate and integrate new hardware platforms including GPU servers and AI accelerators into the provisioning pipeline.

What you'd actually do

  1. Lead the Machine Bootstrap pod in building and evolving provisioning and fleet management at massive scale.
  2. Architect and extend MAPI, the unified Machine API that abstracts bare-metal, GPU hosts, and cloud instances behind a single global interface.
  3. Ship fleet-wide maintenance operations (BIOS updates, firmware updates, configuration changes) to hundreds of thousands of machines through MAPI.
  4. Drive best-in-class provisioning performance, minutes to fully rebuild a machine from scratch.
  5. Evaluate and integrate new hardware platforms including GPU servers and AI accelerators into the provisioning pipeline.

Skills

Required

  • distributed systems
  • infrastructure
  • Go
  • C/C++
  • Rust
  • system level programming languages
  • large-scale distributed systems
  • high-performance automation
  • developer-friendly APIs

Nice to have

  • bare-metal concepts
  • PXE/iPXE
  • DHCP
  • BMC/IPMI/Redfish
  • OS imaging
  • low-level systems experience
  • modern server hardware
  • GPU servers
  • AI accelerators
  • cloud infrastructure

What the JD emphasized

  • massive scale
  • hundreds of thousands of machines
  • best-in-class provisioning performance
  • high-performance automation at fleet scale