Operational Excellence Leader, Infrastructure and Data Center

Google Google · Big Tech · Mountain View, CA +3

This role focuses on operational excellence within data centers, specifically managing and optimizing AI infrastructure. While AI is mentioned in the context of infrastructure management and tooling (AI agents), the core function of the role is not to build or research AI models themselves, but rather to ensure the operational efficiency and reliability of the underlying infrastructure that supports AI. The responsibilities involve managing third-party fleets, driving continuous improvement, and defining technical roadmaps for data center operations, rather than direct AI development.

What you'd actually do

  1. Drive forward-looking, approaches to anticipate and address Data Center Operations (DCOps)’ most testing AI infrastructure problems, specific to Google’s third-party managed fleet.
  2. Act as the executive technical triage point for critical operational issues, assessing and delegating resolution to the appropriate engineering or operational teams to drive permanent change.
  3. Serve as the key technical lead ensuring program excellence, including driving continuous improvement and ensuring agreement requirements are met via tooling and AI agents.
  4. Define and drive the technical roadmap for next-generation third-party DCOps, focusing on solutions that integrate with Google's first-party managed fleet.
  5. Act as a trusted and neutral arbiter to drive consensus and alignment on technical approaches and strategies across cross-functional work streams.

Skills

Required

  • electrical, mechanical/HVAC, or controls experience in an industrial/commercial environment
  • process improvement and performance improvement plans
  • data center operations management
  • third-party vendor management
  • data center critical infrastructure management

Nice to have

  • design, construction, commissioning, or operation of hyperscale, mission-critical data center infrastructure
  • contract management
  • defining and delivering technical roadmaps for global infrastructure

What the JD emphasized

  • AI infrastructure
  • AI agents
  • third-party managed fleet
  • critical operational issues
  • technical roadmap
  • hyperscale computing