Software Engineering Manager, GPU AI Infrastructure

Google Google · Big Tech · Taipei, Taiwan

Software Engineering Manager responsible for leading a team that develops system software and networking technologies for GPU-based AI/ML supercomputers in Google's data centers. This includes managing engineers, defining technical roadmaps, and overseeing the integration, testing, deployment, and debugging of system software for accelerator products.

What you'd actually do

  1. Manage, grow and lead a team with a various technical portfolio that interacts with a large set of Google Services and Cloud teams, and cross-functional stakeholders.
  2. Develop, integrate, test, deploy and debug the system software for GPU and other accelerator systems.
  3. Interact and integrate with a variety of software components including: board and chip firmware, linux kernel drivers, high speed interconnect bus firmware, hardware design, Google data center server management and monitoring stack, etc.
  4. Collaborate with hardware, manufacturing, data center operations team, cloud engineering, and other external partners to plan and execute the programs end-to-end, including product development, vendor engagement, manufacturing, and productivity improvements.
  5. Define technical goal and roadmaps that bridge team priorities with organizational goals. Drive growth through clear role expectations, consistent coaching, and proactive feedback.

Skills

Required

  • software development
  • embedded operating systems
  • technical leadership role
  • people management or team leadership role

Nice to have

  • developing software that interacts with hardware
  • embedded systems
  • drivers
  • system software
  • system integration
  • NPI (New Product Introduction)
  • Machine Learning (ML) concepts
  • GPUs

What the JD emphasized

  • GPU AI Infrastructure
  • systems software
  • accelerator products
  • GPU
  • Machine Learning (ML) concepts and GPUs

Other signals

  • GPU AI Infrastructure
  • systems software
  • accelerator products
  • data center deployment
  • resource management
  • AI/ML supercomputers