Technical Program Manager - Infrastructure

Microsoft Microsoft · Big Tech · Mountain View, CA +2 · Technical Program Management

Technical Program Manager for AI Infrastructure at Microsoft AI, focusing on building and optimizing platforms for large-scale foundation model training, deployment, and serving. The role involves coordinating projects, collaborating with researchers and engineers, and driving progress in a 0->1 environment.

What you'd actually do

  1. Coordinate projects and programs related to AI/ML infrastructure (e.g. pre-training, post-training pipelines, inference & model serving stacks), including end-to-end planning, timelines, milestones, performance metrics, and resource needs.
  2. Collaborate with product teams, engineers, researchers, and external partners to identify gaps and drive timelines toward resolution and mitigation.
  3. Leverage data and analytics to identify opportunities for improvement, track progress, and measure the impact of quality and efficiency programs.
  4. Own the status of key infrastructure projects, proactively identifying risks and proposing solutions to ensure timely delivery.
  5. Communicate program strategies, progress, and results to executive leadership and key stakeholders, advocating for quality and efficiency within the team.

Skills

Required

  • Bachelor's Degree AND 6+ years experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience.
  • 3+ years of experience managing cross-functional and/or cross-team projects.

Nice to have

  • Bachelor's Degree AND 10+ years experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience.
  • 8+ years of experience managing cross-functional and/or cross-team projects.
  • 1+ year(s) of experience reading and/or writing code (e.g., sample documentation, product demos).

What the JD emphasized

  • Deeply understand the design, deployment, and optimization of large-scale infrastructure for AI/ML workloads.
  • Thrive in a scrappy, 0->1, innovative environment, managing high-stakes, time-sensitive, large-scale programs.
  • Advance the AI frontier responsibly.

Other signals

  • building and optimizing platforms, systems, and tools for large-scale training, deployment, and serving of foundation models
  • managing high-stakes, time-sensitive, large-scale programs
  • coordinating projects and programs related to AI/ML infrastructure (e.g. pre-training, post-training pipelines, inference & model serving stacks)