Senior Machine Learning Engineer, Performance

Google Google · Big Tech · Sunnyvale, CA +1

This role focuses on optimizing the performance of AI models (like DeepSeek, Qwen, Gemini, Gemma) on TPUs at Google's scale. It involves fleet-wide analysis, building scaling automation, and last-mile optimization using techniques such as sharding, quantization, and sparsity, while maintaining model quality. The role engages with product teams and researchers to solve performance problems, requiring deep understanding of model design, performance analysis, coding, compilers, and hardware.

What you'd actually do

  1. Analyze performance and efficiency metrics to identify bottlenecks.
  2. Engage with Google product teams, Cloud, researchers to solve their performance problems.
  3. Apply parallelization and optimization techniques, such as sharding, quantization, and sparsity, to improve model performance while meeting pre-defined quality characteristics.
  4. Analyze and debug performance.

Skills

Required

  • software development
  • performance analysis
  • large-scale systems data analysis
  • visualization tools
  • debugging

Nice to have

  • data structures
  • algorithms
  • technical leadership

What the JD emphasized

  • performance
  • optimization
  • model performance
  • quality

Other signals

  • improving model performance
  • optimization techniques
  • TPUs
  • large-scale systems