Staff Software Engineer, Tpu Performance

Google Google · Big Tech · Sunnyvale, CA +1

Staff Software Engineer focused on optimizing the performance and efficiency of Google's TPU fleet for Machine Learning training and serving workloads, including models like Gemini. This role involves deep analysis of performance metrics, collaboration with product teams and researchers, and implementation of solutions at scale, potentially touching compiler, runtime, model co-design, quantization, and sparsity.

What you'd actually do

  1. Focus on Tensor Processing Unit (TPU) fleet efficiency analysis and performance optimization, while identifying and maintaining Machine Learning (ML) training and serving benchmarks.
  2. Use the benchmarks to identify performance opportunities and drive out-of-the-box performance by improving the compiler, runtime, etc. in collaboration with partner teams.
  3. Collaborate with Google product teams and researchers to solve performance problems, such as onboarding new Machine Learning models and products onto new Tensor Processing Unit hardware to enable larger models to train efficiently at a very large scale.
  4. Analyze performance and efficiency metrics to identify bottlenecks, design, and implement solutions at Google fleet-wide scale.
  5. Explore model and data efficiency techniques i.e., model co-design, quantization, and sparsity.

Skills

Required

  • software development
  • testing, and launching software products
  • performance, large-scale systems data analysis, visualization tools, or debugging
  • software design and architecture
  • ML performance analysis and benchmarking

Nice to have

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field
  • data structures and algorithms
  • technical leadership role leading project teams and setting technical direction
  • working in an organization involving cross-functional, or cross-business projects
  • optimizing for NVIDIA/AMD architectures through low-level programming, performance modeling, and bottlenecks analysis
  • hardware-aware algorithm design and compiler stacks (e.g., OpenXLA)

What the JD emphasized

  • performance optimization
  • ML training and serving benchmarks
  • onboarding new Machine Learning models
  • performance and efficiency metrics
  • model and data efficiency techniques

Other signals

  • TPU performance optimization
  • ML training and serving benchmarks
  • compiler and runtime improvements
  • onboarding new ML models
  • model and data efficiency techniques