Senior Software Engineering Manager, Ai/ml, Compute Infrastructure

Google Google · Big Tech · Seattle, WA +1

Senior Software Engineering Manager for Google's Cloud GPU team, focusing on building and maintaining an industry-leading GPU fleet and AI Platform. The role involves managing engineering talent, executing technical roadmaps for GPU infrastructure, integrating new GPU architectures, and overseeing the lifecycle of accelerator solutions for AI workloads. This position is critical for enabling Google's AI innovation and supporting large-scale AI models and services.

What you'd actually do

  1. Execute technical roadmaps for the GPU ecosystem around GPU resilience, anticipating market shifts to keep Google Cloud at the forefront of AI infrastructure.
  2. Collaborate with engineering teams to integrate new GPU architectures into Google Compute Engine (GCE) for rapid workload availability.
  3. Grow and lead engineering talent in the GPU space.
  4. Oversee the lifecycle of accelerator solutions, guaranteeing consistent performance and stability for different user applications.
  5. Serve as a technical advocate during critical issues, collaborating directly with customers to resolve challenges and translating their feedback into platform enhancements.

Skills

Required

  • software development
  • technical project strategy
  • ML design
  • ML infrastructure optimization
  • model deployment
  • model evaluation
  • data processing
  • debugging
  • fine tuning
  • people management
  • team leadership
  • technical leadership
  • speech/audio
  • reinforcement learning
  • ML field specialization

Nice to have

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field
  • complex, matrixed organization experience
  • cloud infrastructure experience (e.g. GPU)

What the JD emphasized

  • 7 years of experience leading technical project strategy, ML design, and optimizing ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning).
  • 5 years of experience with one or more of the following: speech/audio (e.g., technology duplicating and responding to the human voice), reinforcement learning (e.g., sequential decision making), ML infrastructure, or specialization in another ML field.

Other signals

  • AI innovation
  • GPU fleet and AI Platform
  • AI workloads
  • accelerated computing
  • AI and Infrastructure
  • AI models
  • TPUs
  • Vertex AI
  • hyperscale computing