Software Engineer, Gdc LLM Serving and GPU Performance

Google Google · Big Tech · Sunnyvale, CA +3

Software Engineer role focused on optimizing LLM serving infrastructure and GPU performance, including disaggregated serving, KV cache mechanisms, resource allocation, and performance analysis tools. Collaborates with research and engineering teams to deploy LLMs efficiently.

What you'd actually do

  1. Design, develop, and implement enhancements to the LLM serving stack, focusing on performance, scalability, and resource efficiency (e.g., on systems like Wiz, Servomatic).
  2. Contribute to the design and implementation of advanced serving architectures, including disaggregated serving.
  3. Build and maintain infrastructure and tooling for in-depth performance analysis, profiling, and benchmarking of LLM models on GPU accelerators.
  4. Identify and address performance bottlenecks across the stack, working closely with teams providing core GPU libraries and kernels.
  5. Collaborate with research, engineering, and SRE teams to optimize and deploy LLMs in production.

Skills

Required

  • software development
  • software design and architecture
  • ML infrastructure
  • model deployment
  • model evaluation
  • data processing
  • debugging
  • fine tuning
  • GPU performance
  • performance analysis
  • profiling
  • benchmarking
  • LLM serving
  • disaggregated serving
  • resource efficiency

Nice to have

  • Speech/audio
  • reinforcement learning
  • technical leadership
  • cross-functional collaboration

What the JD emphasized

  • 8 years of experience in software development
  • 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture
  • 5 years of experience with one or more of the following: Speech/audio (e.g., technology duplicating and responding to the human voice), reinforcement learning (e.g., sequential decision making), ML infrastructure, or specialization in another ML field.
  • 5 years of experience with ML design and ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning).

Other signals

  • LLM serving
  • GPU performance
  • disaggregated serving
  • performance analysis
  • resource efficiency