Staff Software Engineer, GPU Performance

Google Google · Big Tech · Kirkland, WA +2

Staff Software Engineer focused on optimizing GPU performance for LLM training and serving within Google Cloud's AI infrastructure. This role involves identifying performance bottlenecks, running benchmarks, and implementing solutions at scale, with a strong emphasis on low-level GPU programming and compiler optimizations.

What you'd actually do

  1. Identify and maintain LLM training and serving benchmarks, using them to identify performance opportunities, drive XLA:GPU/Triton performance toward XLA releases.
  2. Engage with various teams, like DeepMind, to solve challenging ML model performance problems.
  3. Run architecture-level simulations on GPU designs and perform roofline analysis to guide partner teams.
  4. Analyze performance and efficiency metrics to identify bottlenecks and then design and implement solutions at Google fleet-wide scale.
  5. Run performance benchmarks on GPU hardware using internal and external tools such as TRT-LLM, vLLM , and SGLang.

Skills

Required

  • Software development
  • Software design and architecture
  • Testing and launching software products
  • Modern GPU architectures (NVIDIA, AMD, or other AI accelerators)
  • Memory hierarchies
  • Performance bottlenecks
  • Modern LLMs and their deployment on AI accelerators
  • Low-level GPU programming (CUDA, Triton, CUTLASS, etc.)
  • Performance engineering techniques

Nice to have

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field
  • Data structures and algorithms
  • Technical leadership
  • Cross-functional project experience
  • Compiler optimization
  • Code generation
  • Runtime systems for GPU architectures (OpenXLA, MLIR, Triton, etc.)

What the JD emphasized

  • modern GPU architectures
  • modern LLMs and their deployment on AI accelerators
  • low-level GPU programming
  • performance engineering techniques
  • compiler optimization
  • code generation
  • runtime systems for GPU architectures

Other signals

  • GPU performance optimization for ML models
  • LLM training and serving benchmarks
  • Low-level GPU programming
  • Compiler optimization for AI accelerators