Engineering Manager, ML Infrastructure

Google Google · Big Tech · Sunnyvale, CA +1

Engineering Manager for ML Infrastructure, focusing on fleet-wide scheduling for Alphabet's ML workloads. This role involves leading a team, setting technical roadmaps, and ensuring efficient, reliable, and easy-to-use scheduling for ML tasks across production machines, supporting services like Vertex AI and Gemini models.

What you'd actually do

  1. Lead our new Workload Optimization (WO) team. Set the technical goal and roadmap and drive its key features in this pivotal role.
  2. Collaborate closely with teams across machine learning (ML), and our product area customers to ensure successful execution.
  3. Shape the team's culture and processes, identify new opportunities, and translate our broader strategy into concrete priorities and projects.
  4. Coach and provide career guidance to your reports, improve our engineering practices, and influence technical direction across the organization.
  5. Navigate open-ended issues and actively contribute to the team's engineering efforts as a technical lead.

Skills

Required

  • software development
  • developing infrastructure, distributed systems or networks
  • compute technologies, storage or hardware architecture
  • technical leadership role
  • people management or team leadership role

Nice to have

  • Master's degree or PhD in Computer Science, or a related technical field
  • working in a matrixed organization
  • end-to-end Machine Learning (ML) development lifecycle and infrastructure
  • communication and cross-team collaboration skills

What the JD emphasized

  • fleet-wide scheduling for all Alphabet Machine Learning (ML) workloads
  • scheduling work on almost all production machines
  • ML, Systems, and Cloud AI (MSCA) organization
  • hyperscale computing
  • Vertex AI
  • Gemini models
  • end-to-end Machine Learning (ML) development lifecycle and infrastructure

Other signals

  • ML infrastructure
  • fleet-wide scheduling
  • production machines
  • hyperscale computing
  • Vertex AI
  • Gemini models