Technical Program Manager, Platform

Scale AI Scale AI · Data AI · New York, NY +1 · Enterprise Engineering

This role is for a Technical Program Manager focused on the Scale Generative AI Platform (SGP). The TPM will partner with engineering teams to accelerate the development and maturity of SGP, owning strategic alignment and end-to-end execution of critical infrastructure initiatives. Responsibilities include lifecycle and platform delivery, cross-functional GenAI alignment, technical translation, risk mitigation, driving developer velocity, and reporting on adoption metrics. The role requires experience building and shipping technical products/platforms, expertise in core engineering infrastructure, and foundational understanding of Generative AI infrastructure.

What you'd actually do

  1. Lead strategic planning and high-velocity execution for SGP core capabilities (orchestration layers, model serving, APIs). Manage features from technical scoping and architecture design through production launch.
  2. Drive execution and manage complex technical dependencies across systems engineering, Core ML, Research, and Product teams to deliver unified SGP capabilities with architectural consistency.
  3. Translate complex infrastructure metrics (LLM inference optimization, GPU utilization, compute orchestration) into actionable roadmaps. Map demands like multi-tenancy, data privacy, and isolation into platform features.
  4. Proactively identify, track, and mitigate technical risks unique to massive-scale GenAI infrastructure and global SGP deployments, maintaining momentum despite fast-evolving AI frameworks.
  5. Establish lightweight agile processes that empower engineers to ship fast without breaking core systems. Define and enforce clear SLOs and performance benchmarks to guarantee production-grade reliability for clients.

Skills

Required

  • 5+ years of experience as a Technical Program Manager, Product Manager, or Software Engineer
  • Platform Domain Expertise: 3+ years of dedicated experience managing programs focused directly on core engineering infrastructure, cloud-native ecosystems (AWS/GCP), container orchestration (Kubernetes), or distributed systems.
  • AI/ML Infrastructure Literacy: Foundational understanding of the infrastructure required for the Generative AI lifecycle, including high-throughput data pipelines, GPU/CPU cluster utilization, or model training/evaluation setups.
  • Masterful Communication: Proven track record of presenting to and influencing executive-level stakeholders, with the ability to translate complex technical/architectural challenges into clear business impacts.
  • Execution Excellence: Advanced proficiency with iterative development methodologies and modern project management tooling (Linear, Jira, etc.) applied to foundational infrastructure environments.

Nice to have

  • Engineering Roots: Strong software engineering fundamentals, with prior professional experience as a Software Engineer, DevOps Engineer, or Data Developer before transitioning into program management.
  • Platform Adoption Track Record: Proven success driving the internal adoption of technical platforms, SDKs, or APIs across disparate, fast-moving product lines.
  • Data-Centric AI Familiarity: Direct experience working with large-scale data quality pipelines, distributed vector databases, or specialized AI inference engines (e.g., Triton, Ray).

What the JD emphasized

  • having built and shipped technical products or platforms from scratch