Engineering Manager, LLM Performance

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

NVIDIA is seeking an Engineering Manager to lead a team focused on accelerating LLM inference software across various frameworks and NVIDIA's datacenter products. The role involves driving the design, implementation, and optimization of inference performance, integrating cutting-edge technologies, and managing software development execution.

What you'd actually do

  1. Lead and grow a team responsible for pushing the performance of LLM inference across multiple LLM frameworks, including TensorRT LLM, vLLM, SGLang and Dynamo on our datacenter products.
  2. Drive the design, implementation and optimization of features that are key to performance in LLM inference.
  3. Continuously improve the performance of LLM inference on current and upcoming NVIDIA datacenter architectures and GPUs.
  4. Continuously improve the performance of LLM inference of important foundation models.
  5. Work with inference benchmark teams to help tune performance for key workloads.

Skills

Required

  • MS, PhD, or equivalent experience in Computer Science, Computer Engineering, AI, or a related technical field.
  • 7+ overall years of overall software engineering experience, including 3+ years of technical leadership experience.
  • Strong background in C++ or Python, with expertise in software design and delivering production-quality software libraries.

Nice to have

  • Deep understanding of GPU architecture, CUDA programming, and system-level performance tuning.
  • Background in LLM inference or working with frameworks such as TensorRT-LLM, vLLM, or SGLang.
  • Passion for building scalable, user-friendly APIs and enabling developers in the AI ecosystem.
  • Have a proven track record of growing and managing a team that encourages idea sharing, empowers team members, and provides opportunities for professional growth.

What the JD emphasized

  • proven ability to lead and scale high-performing engineering teams
  • demonstrated expertise in large language models (LLM) and/or vision language models (VLM) and/or inference in general
  • Deep understanding of GPU architecture, CUDA programming, and system-level performance tuning.
  • Background in LLM inference or working with frameworks such as TensorRT-LLM, vLLM, or SGLang.

Other signals

  • accelerating LLM inference
  • pushing the performance of LLM inference
  • lead and grow a team