Staff Engineer, Tpu Co-design

Google Google · Big Tech · Sunnyvale, CA +1

Staff Engineer focused on co-designing TPU hardware for AI/ML applications, bridging model architecture innovation with next-generation hardware design. Responsibilities include optimizing the hardware/software stack for ML model training and serving, developing simulators, and conducting system-level performance analysis.

What you'd actually do

  1. Drive the definition and optimization of the hardware/software stack to enable performant training and serving of large ML models.
  2. Collaborate with research and modeling teams to innovate on model architectures, focusing on scaling, quality, and their direct impact on hardware performance.
  3. Lead the development of configurable architectural simulators and cycle-accurate performance models to quantify microarchitectural optimizations and evaluate architectural decisions.
  4. Conduct system-level performance analysis across highly distributed ML systems, innovating new methodologies to balance compute, memory bandwidth, and inter-chip network requirements.
  5. Engage with partners across hardware design, compiler development, and ML research to transition architectural innovations from concept to production.

Skills

Required

  • 10 years of experience in computer architecture, chip architecture, or hardware-software co-design.
  • Experience developing systems for performance modeling, simulation, or system analysis.
  • Bachelor's degree in Electrical Engineering, Computer Engineering, Computer Science, or a related field, or equivalent practical experience.

Nice to have

  • Master's degree or PhD in Electrical Engineering, Computer Engineering or Computer Science, with an emphasis on computer architecture.
  • Experience architecting hardware solutions or performance optimizations for large-scale ML training and inference.
  • Experience with deep learning frameworks such as TensorFlow or PyTorch.
  • Deep understanding of ML trends, business drivers, and the software ecosystem.
  • Ability to engage and collaborate with hardware designers, software architects, and ML researchers.

What the JD emphasized

  • custom silicon solutions
  • AI/ML applications
  • AI research and engineering
  • ML research
  • massive foundation models
  • advanced silicon architectures
  • AI and Infrastructure
  • AI models
  • ML training and inference
  • ML trends

Other signals

  • TPU architecture
  • AI/ML hardware acceleration
  • ML serving and training capabilities
  • foundation models
  • silicon architectures