Compute Manager, Serving, Deepmind

Google Google · Big Tech · Mountain View, CA +1

This role is responsible for coordinating the GenAI serving process and supporting compute needs for model training. The individual will work with teams on their requests, improve and automate workflows, and manage computational power for frontier model development. Key responsibilities include debugging infrastructure, advising on compute governance, identifying launch bottlenecks, and developing internal tooling to streamline processes.

What you'd actually do

  1. Grasp of the infrastructure stack to debug hardware and software issues, adjust technical configurations, and advise teams on how to best govern their compute allocations.
  2. Request and coordinating launches, guiding and managing expectations with teams through the milestones required to receive approvals.
  3. Use data sources are discovered, write extensive documentation to help scale impact across teams long-term and help create a single source of truth.
  4. Identify bottlenecks in the launch workflow and assist in developing internal tooling (e.g., dashboards, scripts, agents) to streamline work.
  5. Use data, logic, and concise communication to bring clarity to complex discussions with cross-functional teams.

Skills

Required

  • technical program or project management in a software engineering or cloud environment
  • Sheets, Looker, SQL, and Python for data extraction, modeling, visualization, and automation

Nice to have

  • technical program management in AI or data-centric projects
  • product management and engineering
  • Cloud, AI, and data solutions
  • work cross-functionally with multiple teams and stakeholders
  • relationship building, collaboration and negotiation skills
  • communication, project management and problem-solving skills

What the JD emphasized

  • coordinating the GenAI serving process
  • supporting the compute needs for Gemini training
  • debug hardware and software issues
  • identify bottlenecks in the launch workflow
  • developing internal tooling

Other signals

  • coordinating GenAI serving process
  • supporting compute needs for Gemini training
  • improve/automating these workflows
  • manage the computational power that enables our research engineering teams to develop our frontier models
  • debug hardware and software issues
  • adjust technical configurations
  • advise teams on how to best govern their compute allocations
  • identify bottlenecks in the launch workflow
  • developing internal tooling (e.g., dashboards, scripts, agents) to streamline work