Staff Software Engineer, AI and Infrastructure

Google Google · Big Tech · Sunnyvale, CA +1

This role is for a Staff Software Engineer on the Borglet Infrastructure team within Google Cloud's ML, Systems, & Cloud AI (MSCA) organization. The team develops core pieces of Borglet, Google's node agent for managing user processes, focusing on ML/AI Infrastructure (GPU/TPUs in Borg), Security, Capacity Fungibility, and Warp Space. The role involves designing, implementing, and analyzing computer systems and their interactions with kernel and hardware, collaborating with partner teams, solving ambiguous problems, developing junior engineers, and strategic planning. While the role is within an AI-focused organization and deals with AI infrastructure, the core responsibility is building and managing the underlying infrastructure (Borglet) rather than directly developing or shipping AI models themselves. The role emphasizes large-scale infrastructure and distributed systems.

What you'd actually do

  1. Design, implement, and analyze computer systems and their interactions with the kernel and hardware.
  2. Collaborate with partner teams as well as users across Google e.g., Borg team, ML teams, HW platform teams, SRE teams, Google's internal and Cloud users
  3. Solve ambiguous and high impact problems. Develop junior engineers on the team.
  4. Strategic planning and tactical execution in complex projects. Ability to cross-coordinate across partner teams in Warsaw.

Skills

Required

  • Bachelor's degree or equivalent practical experience.
  • 8 years of experience programming in C++.
  • 5 years of experience testing, and launching software products.
  • 5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture.
  • 3 years of experience with software design and architecture.

Nice to have

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • 3 years of experience in a technical leadership role leading project teams and setting technical direction.
  • 3 years of experience working in a complex, matrixed organization involving cross-functional, or cross-business projects.
  • Experience with Linux Internals, Cluster Management, System Architecture, Virtualization, and Security.

What the JD emphasized

  • ML/AI Infrastructure (GPU/TPUs in Borg)