Senior AI Software Engineer, Kernel Libraries

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

Senior AI Software Engineer focused on developing kernel libraries and inference systems software to accelerate AI workloads, including LLMs and agents, on NVIDIA's hardware. Responsibilities include innovating and optimizing kernels, designing abstractions for serving engines, and building compilers/runtimes.

What you'd actually do

  1. Innovating and developing new AI systems technologies for efficient inference
  2. Designing, implementing, and optimizing kernels for high impact AI workloads
  3. Designing and implementing extensible abstractions for LLM serving engines
  4. Building efficient just-in-time domain specific compilers and runtimes
  5. Collaborating closely with other engineers at NVIDIA across deep learning frameworks, libraries, kernels, and GPU arch teams

Skills

Required

  • Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
  • 6+ years (academic/ industry) experience with ML/DL systems development
  • Deep learning frameworks (e.g. PyTorch, JAX, TensorFlow, ONNX)
  • Inference engines and runtimes (e.g. vLLM, SGLang, and MLC)
  • Python
  • C/C++

Nice to have

  • PhD
  • Domain specific compiler and library solutions for LLM inference and training (e.g. FlashInfer, Flash Attention)
  • Expertise in inference engines like vLLM and SGLang
  • Expertise in machine learning compilers (e.g. Apache TVM, MLIR)
  • GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar)
  • Open source project ownership or contributions

What the JD emphasized

  • 6+ years (academic/ industry) experience with ML/DL systems development preferable
  • Strong experience in developing or using deep learning frameworks (e.g. PyTorch, JAX, TensorFlow, ONNX, etc) and ideally inference engines and runtimes such as vLLM, SGLang, and MLC.
  • Strong Python and C/C++ programming skills
  • Strong experience in GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar)

Other signals

  • developing libraries
  • code generators
  • GPU kernel technologies
  • LLM inference runtimes
  • accelerate large language models
  • agents