Senior Staff Software Architect, GPU Uber Tech Leads

Google Google · Big Tech · Sunnyvale, CA +1

Senior Staff Software Architect role focused on the software stack above firmware for Google's AI and HPC infrastructure, specifically concerning distributed systems, Linux OS, networking, and power management for accelerator platforms like GPUs and TPUs. The role involves technical leadership, architecture definition, and driving large-scale technical programs from concept to deployment to enable massive-scale AI and Cloud services.

What you'd actually do

  1. Serve as the Tech Lead (TL), defining the architecture and technical road map for the software stack on our accelerator platforms.
  2. Drive large-scale technical programs from concept to deployment, ensuring cross-team alignment and on-time delivery of complex systems. This includes interfacing with hardware, software, and SRE teams to deliver scalable solutions for Google's Data Centers.
  3. Be responsible for guiding multiple teams through the successful design, development, and execution of this roadmap.
  4. Focus on distributed systems software, core Linux OS components, Linux Networking, Power Management strategies, and the intricate interactions with hardware buses such as PCIe, USB, and I2C.

Skills

Required

  • software development
  • technical leadership
  • architecting and developing software for distributed systems
  • C or C++
  • Linux OS internals, kernel development, or systems programming
  • Linux networking concepts and development

Nice to have

  • system-level power management techniques
  • software development for accelerators (e.g., GPUs, TPUs) in data center environments
  • low-level platform bring-up and debugging
  • technically leading and mentoring a team of Engineers
  • industry standardization bodies (e.g., PCI-SIG, Compute Express Link (CXL) Consortium, Distributed Management Task Force (DMTF), Open Compute Project (OCP))
  • High-Performance Computing (HPC) systems and networking

What the JD emphasized

  • architect and drive the software innovations that power Google's AI and HPC infrastructure
  • enabling massive-scale deployment of Accelerators (e.g., GPUs, TPUs, etc.) for critical Google services and Cloud
  • Your work is fundamental to unlocking new frontiers in AI
  • architecting and developing software for distributed systems
  • Linux OS internals, kernel development, or systems programming
  • Linux networking concepts and development

Other signals

  • architect and drive the software innovations that power Google's AI and HPC infrastructure
  • enabling massive-scale deployment of Accelerators (e.g., GPUs, TPUs, etc.) for critical Google services and Cloud
  • Your work is fundamental to unlocking new frontiers in AI