Staff Software Engineer, GPU Performance

Google Google · Big Tech · Sunnyvale, CA +3

Staff Software Engineer focused on optimizing GPU performance for LLM training and serving within Google Cloud. This role involves benchmarking, performance analysis, and implementing solutions at scale, working with cutting-edge AI accelerators and low-level GPU programming.

What you'd actually do

  1. Identify and maintain LLM training and serving benchmarks, using them to identify performance opportunities, drive XLA:GPU/Triton performance toward XLA releases.
  2. Engage with various teams, like DeepMind, to solve challenging ML model performance problems.
  3. Run architecture-level simulations on GPU designs and perform roofline analysis to guide partner teams.
  4. Analyze performance and efficiency metrics to identify bottlenecks and then design and implement solutions at Google fleet-wide scale.
  5. Run performance benchmarks on GPU hardware using internal and external tools such as TRT-LLM, vLLM , and SGLang.

Skills

Required

  • software development
  • software design and architecture
  • modern GPU architectures
  • memory hierarchies
  • performance bottlenecks
  • modern LLMs
  • deployment on AI accelerators
  • low-level GPU programming
  • CUDA
  • Triton
  • performance engineering techniques

Nice to have

  • Master’s degree or PhD
  • data structures and algorithms
  • technical leadership
  • cross-functional projects
  • compiler optimization
  • code generation
  • runtime systems for GPU architectures
  • OpenXLA
  • MLIR
  • Triton

What the JD emphasized

  • modern GPU architectures
  • modern LLMs and their deployment on AI accelerators
  • low-level GPU programming
  • performance engineering techniques
  • GPU hardware
  • performance benchmarks

Other signals

  • GPU performance optimization
  • LLM serving benchmarks
  • low-level GPU programming
  • AI accelerator deployment