Technical Lead Manager, Torchtpu

Google Google · Big Tech · London, United Kingdom

Technical Lead Manager for TorchTPU at Google, responsible for leading a team of software engineers to develop and optimize frameworks and compilers for Cloud TPUs, enabling large-scale training and inference of ML models, particularly for generative AI and LLMs, in collaboration with customers and open-source communities.

What you'd actually do

  1. Lead and manage a team of software engineers, promoting a collaborative culture and psychological safety.
  2. Coach and mentor engineers to achieve their potential while aligning team execution with TorchTPU priorities and organizational goals.
  3. Collaborate with global peer managers and teams to drive AI framework development, enabling PyTorch models to run with peak performance on Cloud TPUs.
  4. Deliver end-to-end performance compiler optimizations and contribute to open-source software, supporting advanced ML frameworks and compilers on Cloud TPUs and GPUs.
  5. Enable PyTorch models at massive scale for generative models, computer vision, language modeling, and other advanced machine learning applications.

Skills

Required

  • software development in one or more programming languages (e.g., Python, C++ or C)
  • technical leadership role; overseeing projects
  • people management, supervision/team leadership role
  • machine learning frameworks
  • compiler technology
  • high-performance computing (HPC)
  • leading engineering projects with cross-functional or global stakeholders

Nice to have

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field
  • leading teams on compiler stacks or infrastructure, such as Multi-Level Intermediate Representation (MLIR) or Low Level Virtual Machine (LLVM)
  • optimizing performance for Generative AI and Large Language Models (LLMs)
  • contributing to or maintaining large-scale open-source machine learning projects
  • HPC, GPU workloads, or ML frameworks like JAX, PyTorch, or TensorFlow
  • Proven track record of delivering global projects through cross-functional collaboration

What the JD emphasized

  • manage a team of Engineers
  • manage your project goals
  • manage engineers across multiple teams and locations
  • manage a large product budget
  • manage a team of software engineers
  • people management
  • technical leadership role
  • team leadership role
  • lead engagements with customers
  • lead the way
  • lead and manage a team
  • lead engagements
  • lead teams on compiler stacks

Other signals

  • Develop frameworks and compilers that support the GCP Cloud TPU service
  • Provide customers with large-scale access to Google’s first-party ML supercomputers to run training and inference workloads using PyTorch and JAX
  • Lead engagements with customers to help them achieve massive scale and speed on Google’s TPUs
  • Drive AI framework development, enabling PyTorch models to run with peak performance on Cloud TPUs
  • Deliver end-to-end performance compiler optimizations and contribute to open-source software, supporting advanced ML frameworks and compilers on Cloud TPUs and GPUs
  • Enable PyTorch models at massive scale for generative models, computer vision, language modeling, and other advanced machine learning applications