Staff Software Engineer, ML Fleet Systems

Google Google · Big Tech · Sunnyvale, CA +2

Staff Software Engineer role focused on building and providing technical leadership for ML Fleet Systems, which involves general software engineering tasks like coding, testing, deploying services, and operational support for AI and Infrastructure teams at Google. The role contributes to the development of AI models and computing power at scale.

What you'd actually do

  1. Design, develop, test, deploy, maintain, and enhance software features and components as part of a team.
  2. Analyze and resolve technical issues of moderate complexity, contributing to design choices and implementation details.
  3. Write high-quality, maintainable, and well-tested code, adhering to engineering best practices and participating actively in code reviews.
  4. Collaborate effectively with other engineers, and potentially with Product Managers and UX Designers, to achieve shared goals and deliver projects.
  5. Contribute to the operational health and reliability of ML Fleet systems, including debugging, bug fixing, monitoring, and continuous improvement efforts.

Skills

Required

  • software development
  • Java
  • Python
  • Go
  • C++
  • software design
  • software architecture
  • testing
  • launching software products

Nice to have

  • resource management systems
  • Kubernetes
  • Flex
  • cluster management
  • scheduling algorithms
  • machine learning hardware accelerators
  • TPUs
  • GPUs
  • lifecycle management
  • communication
  • teamwork
  • problem solving
  • investigative skills

Other signals

  • ML Fleet Systems
  • AI and Infrastructure team
  • TPUs
  • Vertex AI