Senior Software Engineer, Compute Fleet Management

Roblox Roblox · Consumer · San Mateo, CA · Software Engineering

Senior Software Engineer to build products for streamlining provisioning of GPUs and Compute resources, improving AI capacity delivery, uptime, and OS security. Focus on Golang controllers for Kubernetes and gRPC APIs to abstract data center complexities for Roblox Builders.

What you'd actually do

  1. Develop and maintain a fleet wide machine daemon for efficient hardware/software monitoring, runtime updates, and secure machine access.
  2. Write Golang controllers for Roblox's fleet lifecycle, ensuring smooth functioning at all times.
  3. Handle OS installation, firmware provisioning, and secure recycling processes.
  4. Build and maintain a robust framework for HW/OS/Kernel validation and performance tuning.
  5. Provide abstraction across cloud and on-premise systems, supporting stateful services.

Skills

Required

  • 5+ years of industry experience
  • Strong proficiency in Go, C/C++, Rust or other system level programming languages
  • Golang controllers for Kubernetes
  • gRPC APIs

Nice to have

  • No experience required in this specific area of infrastructure, but being excited about the problem space is a must.
  • Prefer building autonomous systems over ops and repetitive tasks.
  • Like and understand the importance of documentation for large scale systems.
  • Customer, team, and quality oriented.
  • Like getting things done

What the JD emphasized

  • building products with a customer-centric approach
  • Strong consideration for production health and experience working on reliable, sustainable production systems.
  • Care deeply about production health and reliable, sustainable production systems.