Principal Software Engineer, Cluster Lifecycle

Roblox Roblox · Consumer · San Mateo, CA · Software Engineering

Roblox is seeking a Principal Software Engineer to build and evolve the infrastructure for their private cloud, managing millions of containers serving hundreds of millions of requests per second. The role focuses on creating a sustainable and reliable compute primitive across all backend environments, working closely with other teams to develop new features and support new workloads. The ideal candidate has extensive experience in the Kubernetes ecosystem, strong Go proficiency, and enjoys working on large-scale, distributed systems with a focus on automation, observability, and reliability.

What you'd actually do

  1. Build and evolve a [cell](https://blog.roblox.com/2023/12/making-robloxs-infrastructure-efficient-resilient/) primitive for Roblox that runs the backends for the vast majority of Roblox’s compute workload.
  2. Work closely with other teams in Compute and across the company to develop new features, support for new workloads, and define the right cross-system APIs as we expand the footprint of ‘cells’.
  3. Safely and reliably manage a critical at-scale system.

Skills

Required

  • 8+ years of experience
  • Experience working in the Kubernetes ecosystem
  • Strong proficiency in Go or other well structured programming languages
  • Enjoy working on critical, large-scale, cross-platform, multi-tenant distributed systems
  • Prefer building systems automation over operational and repetitive tasks
  • An appreciation for working on observability and reliability to build long term sustainable systems

Nice to have

  • Prior experience building Kubernetes operators or building/running Kubernetes distributions

What the JD emphasized

  • critical at-scale system