Manager, Large Language Model Inference

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

Manager for Large Language Model Inference at NVIDIA, focusing on developing and optimizing LLM/VLM/VLA inference software for NVIDIA GPUs and hardware platforms. The role involves leading a team in specialized kernel development, runtime optimizations, and frameworks for LLM inference, with a strong emphasis on delivering production-grade, high-performance software.

What you'd actually do

  1. Lead and grow a team responsible for specialized kernel development, runtime optimizations, and frameworks for LLM inference.
  2. Drive the design, development, and delivery of production inference software, targeting NVIDIA's next-generation enterprise and edge hardware platforms.
  3. Integrating cutting-edge technologies developed at NVIDIA and offering an intuitive developer experience for LLM deployment.
  4. Lead software development execution, with responsibility for project planning, milestone delivery, and cross-functional coordination.

Skills

Required

  • MS, PhD, or equivalent experience in Computer Science, Computer Engineering, AI, or a related technical field.
  • 7+ overall years of overall software engineering experience, including 3+ years of technical leadership experience.
  • Proven ability to lead and scale high-performing engineering teams, especially across distributed and cross-functional groups.
  • Strong background in C++ or Python, with expertise in software design and delivering production-quality software libraries.
  • Demonstrated expertise in large language models (LLM) and/or vision language models (VLM).

Nice to have

  • Deep understanding of GPU architecture, CUDA programming, and system-level performance tuning.
  • Background in LLM inference or working with frameworks such as TensorRT-LLM, vLLM, or SGLang.
  • Passion for building scalable, user-friendly APIs and enabling developers in the AI ecosystem.
  • Have a proven track record of growing and managing a team that encourages idea sharing, empowers team members, and provides opportunities for professional growth.

What the JD emphasized

  • LLM inference
  • production-quality software libraries
  • LLM inference
  • growing and managing a team

Other signals

  • LLM inference software technologies
  • LLM/VLM/VLA inference
  • production inference software
  • next-generation enterprise and edge hardware platforms
  • core LLM inference runtime