Senior Product Manager, Jobs Platform

Roblox Roblox · Consumer · San Mateo, CA · Product Management

Roblox is seeking a Senior Product Manager to lead their Jobs Platform, an orchestration system for microservices powering AI Inference. The role involves defining the vision for deploying, running, and scaling AI and other services, evolving the platform's technical core, improving developer experience, and architecting intelligent scaling solutions using AI-powered prediction. The ideal candidate has experience with distributed systems, Kubernetes, GPU architecture, and AI model development, with a focus on building developer-facing platforms.

What you'd actually do

  1. Own Jobs Platform end-to-end - set the multi-year vision for how Roblox engineers deploy, run, and scale AI and other services across our Core and Edge Datacenters, and cloud.
  2. Power Roblox's AI future - build the platform that brings frontier models and next-gen AI workloads to life, with the primitives, scheduling guarantees, and resource classes AI teams need to move fast.
  3. Evolve the platform's technical core - define how Roblox prioritizes, isolates, and preempts workloads across shared fleet, and shape the entire system across global control plane, multi-region orchestration, distributed systems primitives, capacity/state management, DR/failover, and operator-based Kubernetes extensions.
  4. Make developer experience the reason teams choose Jobs Platform - by delivering integrated tooling (CI/CD, telemetry, profiling, tracing) and managing robust SLOs via real-time fleet signals (health, queue depth, scheduling efficiency)—all while engineering a frictionless, effortless onboarding experience for teams.
  5. Architect intelligent, predictive scaling - advance Multi-Cloud Bursting from reactive to proactive with AI-powered prediction, and partner with Data Science and Finance on demand forecasting, cost-to-serve analytics, and AI-driven resource optimization.

Skills

Required

  • product management experience building resource orchestration, workload management platforms or large-scale distributed systems
  • Track record of building developer-facing platforms that engineers genuinely love to use
  • deep understanding of modern software development lifecycle and developer tooling
  • abstracting complex infrastructure into clean, maintainable platform schemas
  • Built and operated Kubernetes and enterprise-grade service mesh at scale
  • understand the developer pain points around control plane mechanics like etcd scaling and API server bottlenecks
  • shipped custom Operators, Controllers, and CRDs, going beyond vanilla Kubernetes
  • Familiarity with GPU/accelerator architecture and the complex scheduling challenges that come with it
  • Background in AI model development, training, inference
  • builder mindset
  • Experience building workload management or resource orchestration platforms in AWS, Google Cloud Platform (GCP), Azure or other cloud providers

Nice to have

  • Kernel-level experience or familiarity with custom kernel drivers
  • Experience building agentic systems for workload management or infrastructure

What the JD emphasized

  • AI Inference across Roblox
  • scale the platform to become the default runtime for critical Roblox services worldwide
  • build the platform that brings frontier models and next-gen AI workloads to life
  • developer-facing platforms
  • Kubernetes and enterprise-grade service mesh at scale
  • developer pain points around control plane mechanics
  • custom Operators, Controllers, and CRDs
  • GPU/accelerator architecture
  • AI model development, training, inference
  • AI-powered prediction

Other signals

  • AI Inference across Roblox
  • scale the platform to become the default runtime for critical Roblox services worldwide
  • build the platform that brings frontier models and next-gen AI workloads to life
  • Architect intelligent, predictive scaling - advance Multi-Cloud Bursting from reactive to proactive with AI-powered prediction
  • partner with Data Science and Finance on demand forecasting, cost-to-serve analytics, and AI-driven resource optimization