Senior Staff Engineer, Tpu Co-design

Google Google · Big Tech · Sunnyvale, CA +1

Senior Staff Engineer focused on co-designing TPU hardware for AI/ML training and serving. The role involves defining the hardware/software roadmap, bridging AI research with hardware design, and optimizing performance for large ML models. This position operates at the intersection of AI research and infrastructure engineering, aiming to deliver high-performance, power-efficient accelerators.

What you'd actually do

  1. Define and drive the technical roadmap and architecture for the hardware/software stack, ensuring unparalleled performance for the training and serving of large ML models.
  2. Act as the technical liaison between advanced research, software, and hardware teams, steering model architecture innovation to maximize scaling, quality, and hardware efficiency.
  3. Architect and oversee the development of next-generation configurable simulation frameworks and cycle-accurate performance models, setting the standard for how the organization evaluates complex micro-architectural decisions.
  4. Advocate system-level performance analysis across highly distributed ML systems, innovating new methodologies to balance compute, memory bandwidth, and inter-chip network requirements.
  5. Manage cross-functional partnerships across hardware engineering, compiler development, and ML research to influence broad organizational strategy and transition paradigm-shifting concepts into production.

Skills

Required

  • Bachelor's degree in Electrical Engineering, Computer Engineering, Computer Science, a related field, or equivalent practical experience.
  • 12 years of experience in computer architecture, chip architecture, or hardware-software co-design.
  • Experience developing systems for performance modeling, simulation, or system analysis.

Nice to have

  • Master's degree or PhD in Electrical Engineering, Computer Engineering or Computer Science, with an emphasis on computer architecture.
  • Experience as a lead architect driving multi-generational hardware solutions or performance optimizations for massive-scale ML training and inference.
  • Experience with deep learning frameworks (e.g., PyTorch, TensorFlow) and their underlying execution models.
  • Knowledge of semiconductor trajectories, including process, memory, interconnects, and packaging.
  • Understanding of ML trends, business drivers, and the software ecosystem.
  • Ability to engage and align stakeholders, hardware designers, and the global ML research community.

What the JD emphasized

  • hardware/software stack
  • training and serving of large ML models
  • model architecture innovation
  • performance models
  • system-level performance analysis
  • ML research

Other signals

  • TPU architecture
  • AI/ML hardware acceleration
  • ML training and serving
  • custom silicon solutions