Senior Dl Algorithms Engineer - Inference Performance

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

Senior engineer to optimize LLM/Omni model inference performance on NVIDIA's accelerated inference software stack, working across hardware and software layers. Involves enabling and optimizing open models, contributing code to frameworks like TRT-LLM and vLLM, profiling bottlenecks, and benchmarking.

What you'd actually do

  1. Enable and optimize state-of-the-art open models (like Nemotron and Cosmos) on NVIDIA’s accelerated inference SW stack.
  2. Contribute new features, fix bugs and deliver production code to open-source frameworks like TRT-LLM, vLLM, SGLang, FlashInfer, etc.
  3. Profile and analyze bottlenecks across the full inference stack to push the boundaries of inference performance.
  4. Benchmark state-of-the-art offerings and perform competitive analysis for NVIDIA’s SW/HW stack.
  5. Co-design with partner teams to develop the next generation of AI models and services.

Skills

Required

  • PhD in CS, EE or CSEE or equivalent experience
  • 3+ years of experience
  • Strong background in deep learning and neural networks, in particular inference
  • Experience with performance profiling, analysis and optimization, especially for GPU-based applications
  • Proficient in PyTorch or equivalent frameworks for AI, or HPC-heavy application development
  • Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture

Nice to have

  • Proven experience with processor and system-level performance optimization
  • Deep understanding of modern LLM/Diffusion architectures
  • Strong fundamentals in algorithms
  • GPU programming experience (CUDA or OpenCL)

What the JD emphasized

  • performance analysis and optimization
  • squeeze every last clock cycle
  • across all layers of the hardware/software stack
  • peak performance
  • inference performance
  • processor and system-level performance optimization
  • GPU programming experience (CUDA or OpenCL) is a strong plus

Other signals

  • LLM inference optimization
  • GPU architecture
  • deep learning frameworks