Senior Software Engineer, AI and Dl Kernel Libraries

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +7 · Remote

Develops libraries, code generators, and GPU kernel technologies for NVIDIA's AI inference systems software stack, focusing on accelerating AI inference through efficient kernels, abstractions, and runtimes for LLMs and agents.

What you'd actually do

  1. Innovating and developing new AI systems technologies for efficient inference
  2. Designing, implementing, and optimizing kernels for high impact AI workloads
  3. Designing and implementing extensible abstractions for LLM serving engines
  4. Building efficient just-in-time domain specific compilers and runtimes
  5. Collaborating closely with other engineers at NVIDIA across deep learning frameworks, libraries, kernels, and GPU arch teams

Skills

Required

  • Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
  • 6+ years of ML/DL systems development experience
  • Deep learning frameworks (PyTorch, JAX, TensorFlow, ONNX)
  • Inference engines and runtimes (vLLM, SGLang, MLC)
  • Python programming
  • C/C++ programming
  • GPU kernel development
  • Performance optimizations
  • CUDA C/C++
  • cuTile
  • Triton

Nice to have

  • PhD
  • Domain specific compiler and library solutions for LLM inference and training (e.g. FlashInfer, Flash Attention)
  • Inference engines like vLLM and SGLang
  • Machine learning compilers (e.g. Apache TVM, MLIR)
  • Open source project ownership or contributions

What the JD emphasized

  • Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); PhD are preferred
  • 6+ years (academic/ industry) experience with ML/DL systems development preferable
  • Strong experience in developing or using deep learning frameworks (e.g. PyTorch, JAX, TensorFlow, ONNX, etc) and ideally inference engines and runtimes such as vLLM, SGLang, and MLC.
  • Strong experience in GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar)

Other signals

  • inference systems software stack
  • accelerate for AI inference
  • GPU kernel technologies
  • LLM inference runtimes components
  • kernel code generators