Senior Software Engineer, Machine Learning Inference

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Senior Software Engineer role focused on designing and implementing inference software optimizations for NVIDIA TensorRT and TensorRT-LLM to accelerate AI applications on NVIDIA GPUs. Involves C++, Python, and CUDA development, collaboration with AI experts, and optimization of deep learning frameworks and compilers.

What you'd actually do

  1. Design, develop and optimize NVIDIA TensorRT and TensorRT-LLM to supercharge inference applications for datacenter, workstations, and PCs.
  2. Develop software in C++, Python, and CUDA for seamless and efficient deployment of state-of-the-art LLMs and Generative AI models.
  3. Collaborate with deep learning experts and GPU architects throughout the company to influence Hardware and Software design for inference.

Skills

Required

  • C++
  • Python
  • CUDA
  • Deep Learning Frameworks
  • Compilers
  • System Software
  • software development experience

Nice to have

  • Rust
  • inference backends
  • GPU programming
  • CUDA
  • OpenCL
  • LLM inference frameworks
  • TensorRT-LLM
  • vLLM
  • SGLang
  • deep learning frameworks
  • TensorRT
  • PyTorch
  • JAX
  • close-to-metal performance analysis
  • optimization techniques
  • tools

What the JD emphasized

  • required
  • Strong proficiency in C++ (required)

Other signals

  • inference software optimizations
  • deploying state-of-the-art LLMs
  • GPU programming