Engineer Manager, Google Distributed Cloud and Sovereign Cloud

Google Google · Big Tech · Sunnyvale, CA +2

Manager for engineering teams building and deploying AI infrastructure for Google Distributed Cloud and Sovereign Cloud, focusing on GenAI, air-gapped runtimes, and optimizing inference performance in regulated environments.

What you'd actually do

  1. Set team priorities supporting GDC and Sovereign Cloud goals, specifically for GenAI and air-gapped runtimes. Align direction and decision-making across distributed teams to ensure high-velocity platform execution.
  2. Scale and mentor an engineering organization specialized in container orchestration and AI infrastructure. Provide continuous coaching to help individuals navigate technical complexities and career growth within highly regulated environments.
  3. Architect the mid-term technical road map for GKE AI, optimizing for accelerator (GPU/TPU) efficiency and cost-per-token. Evolve systems to meet future infrastructure needs for autonomous, self-managing cloud operations in restricted settings.
  4. Design and vet complex system architectures, balancing performance and sovereignty. Advocate the engineering best practices, style, testability, and efficiency to ensure, secure, and auditable deployments.

Skills

Required

  • software development
  • large-scale infrastructure
  • distributed systems
  • networks
  • compute technologies
  • storage
  • hardware architecture
  • technical leadership
  • people management
  • container orchestration platforms
  • machine learning infrastructure
  • hardware accelerators
  • large-scale AI model serving platforms

Nice to have

  • Master's degree or PhD
  • complex, matrixed organization
  • building or managing large-scale AI/ML infrastructure
  • Generative AI inference
  • Sovereign Cloud architecture
  • air-gapped environments
  • FedRAMP
  • StateRAMP
  • ITIL
  • Kubernetes-based solutions
  • hybrid, on-premises, or edge computing environments

What the JD emphasized

  • highly regulated environments
  • complex regulated frameworks
  • complex regulatory compliance standards

Other signals

  • leading teams building AI infrastructure
  • optimizing AI inference performance and efficiency
  • deploying AI capabilities in regulated environments