Software Tech Lead, Gpu, AI Infrastructure

Google Google · Big Tech · Taipei, Taiwan

This role involves leading a team focused on developing and integrating system software for GPU-based AI/ML supercomputers in Google's data centers. The work includes designing, developing, testing, and debugging system software and networking technologies for accelerator products, with a focus on enabling AI/ML innovations for Google and its Cloud customers. The role requires interaction with various software and hardware components, collaboration with cross-functional teams, and setting team priorities and technical roadmaps.

What you'd actually do

  1. Lead and guide a team with a different technical portfolio that interacts with a large set of Google Services and Cloud teams, and cross-functional stakeholders.
  2. Set and communicate team priorities that support the broader organization's goals and develop the mid-term technical outlook and roadmap. Align strategy, processes, and decision-making across teams.
  3. Develop, integrate, test, deploy and debug the system software for GPU and other accelerator systems.
  4. Interact and integrate with a variety of software components including: board and chip firmware, linux kernel drivers, high speed interconnect bus firmware, hardware design, Google data center server management and monitoring stack, etc.
  5. Collaborate with hardware, manufacturing, data center operations team, cloud engineering, and other external partners to plan and execute the programs end-to-end, including product development, vendor engagement, manufacturing, and productivity improvements.

Skills

Required

  • software development
  • testing and launching software products
  • embedded operating systems
  • software design and architecture
  • technical leadership

Nice to have

  • data structures and algorithms
  • complex, matrixed organization involving cross-functional, or cross-business projects

What the JD emphasized

  • GPU-based AI/ML supercomputers
  • system software for GPU and other accelerator systems
  • technical leadership role

Other signals

  • GPU-based AI/ML supercomputers
  • enabling AI/ML innovations
  • systems software and networking technologies for accelerator products
  • AI and Infrastructure team is redefining what’s possible
  • delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity
  • shaping the future of world-leading hyperscale computing