Capacity Tpm

Cerebras Cerebras · Semiconductors · Headquarters +1 · Software

Cerebras is seeking a Technical Program Manager to lead capacity planning and fleet strategy for their Inference Service organization. This role involves managing the 6/12/26-week rolling capacity model, forecasting requirements, collaborating on new datacenter bring-up, and driving adoption of capacity planning tools. The TPM will also track incidents, manage SLA drops due to capacity issues, and lead strategic initiatives for capacity expansion and fleet efficiency. The role requires experience in cloud infrastructure, large-scale ML serving, or hyperscaler capacity planning, with comfort in the inference serving stack and data fluency.

What you'd actually do

  1. Build and maintain the 6 / 12 / 26-week rolling capacity model across every cluster.
  2. Collaborate with datacenter infrastructure and operations teams to support new datacenter bringup and ensure production readiness.
  3. Partner closely with the SRE and product team to run the weekly capacity review across different customers/models/clusters.
  4. Partner with console engineering team to drive stakeholder adoption of the inhouse built capacity planning and allocation tool, including user acceptance testing, issue resolution, tracking changes, pilot testing and deployment.
  5. Proactively identify and mitigate capacity bottlenecks, risks, and dependencies.

Skills

Required

  • Technical Program Management
  • Capacity Planning
  • Fleet Strategy
  • Inference Serving Stack
  • SQL
  • Grafana
  • Python
  • Flux
  • Cross-functional program leadership
  • AI Accelerator Fleet Operations

Nice to have

  • Product Operations experience
  • Large-scale ML serving experience
  • Hyperscaler capacity planning experience

What the JD emphasized

  • 5+ years of TPM, technical program management, or product operations experience in cloud infrastructure, large-scale ML serving, or hyperscaler capacity planning
  • Comfort with the inference serving stack: model replicas, batching, prefill/decode, KV cache, accelerator scheduling
  • Direct experience with AI accelerator fleet operations such as Habana, TPU pods, Inferentia, Trainium

Other signals

  • Capacity planning for AI inference platform
  • Fleet strategy for inference services
  • Maximizing utilization of AI accelerators