Technical Program Manager Ii, Infrastructure and Capacity, Platforms and Devices

Google Google · Big Tech · Mountain View, CA +1

Technical Program Manager II for Chrome Infrastructure and Capacity, focusing on managing the hardware ecosystem, capital expenditure, and resource allocation for compute, storage, and TPUs. The role involves stabilizing demands, establishing governance, mitigating emergency funding requests, and working with SRE, Planning, Data Science, and Research Science teams to ensure data-driven and efficient capacity management.

What you'd actually do

  1. Lead the consolidation of Chrome’s hardware capacity ecosystem, creating a centralized, scalable function to oversee significant capital expenditure (CapEx) and resource allocation and drive strategic optimization of our technical infrastructure strategy.
  2. Provide tactical emergency oversight for immediate compute and storage escalations, and develop sustainable processes that reduce the need for emergency funding requests.
  3. Serve as the primary liaison between Chrome Program Manager/Engineer and SRE, PARM, actively transitioning and scaling existing workflows to establish a dedicated, timezone-aligned support model for infrastructure operations.
  4. Serve as an organizational team multiplier for Chrome's data science team. Help translate data insights into faster executive action, standardize infrastructure and improve coordination across engineering and product.

Skills

Required

  • Bachelor's degree in a technical field, or equivalent practical experience.
  • 2 years of experience in program management.
  • Experience in technical program management managing cross-functional programs within technical infrastructure, data science, or engineering productivity domains.
  • Experience in capacity planning or resource management.
  • Experience utilizing data analytics or data science methodologies to drive business critical decision-making.

Nice to have

  • 2 years of experience managing cross-functional or cross-team projects.
  • Experience managing technical programs within Machine Learning (ML) infrastructure, specifically involving Tensor Processing Units (TPUs), large-scale storage systems, or cloud hardware operations.
  • Experience in infrastructure management, capacity planning, cloud operations, or working directly with Site Reliability Engineering (SRE) teams.

What the JD emphasized

  • technical infrastructure strategy
  • capacity planning
  • resource management
  • Machine Learning (ML) infrastructure
  • Tensor Processing Units (TPUs)
  • large-scale storage systems
  • cloud hardware operations