Software Engineer, AI and Dl Kernel Libraries

NVIDIA NVIDIA · Semiconductors · Shanghai, China

Software Engineer focused on developing and optimizing AI inference systems software, including deep learning primitives, kernel libraries, and LLM inference runtimes, on NVIDIA GPUs.

What you'd actually do

  1. Develop production-quality software that ships as part of NVIDIA's AI software stack, including cuDNN, FlashInfer, and optimized support for large language model inference workloads.
  2. Innovate and develop new AI systems technologies for efficient inference, with a focus on performance, scalability, maintainability, and usability.
  3. Design, implement, and optimize kernels for high-impact AI workloads across LLM inference, generative AI, computer vision, autonomous driving, and recommender systems.
  4. Design and implement extensible software abstractions for deep learning libraries, LLM serving engines, and runtime systems.
  5. Build and improve just-in-time compilation, code generation, and runtime technologies for performance-critical GPU workloads.

Skills

Required

  • Master's degree in Computer Science, Electrical Engineering, or a related field, or equivalent experience.
  • 3+ years of relevant industry, research, or systems software development experience in machine learning, deep learning systems, compilers, or GPU software.
  • Strong programming skills in C/C++ and Python, with hands-on experience developing high-performance software.
  • Solid experience with CUDA development and GPU programming fundamentals.
  • Strong experience developing or using deep learning frameworks such as PyTorch, JAX, TensorFlow, or ONNX.
  • Good understanding of linear algebra, performance analysis, profiling, and code optimization.
  • Experience designing software abstractions, APIs, or higher-level system architecture for performance-sensitive systems.
  • Familiarity with modern machine learning and inference system trends, especially around LLMs and generative AI.

Nice to have

  • Hands-on experience with inference engines and runtimes such as vLLM, SGLang, MLC, TensorRT-LLM, or similar systems.
  • Background in domain-specific compiler, code generation, or library solutions for LLM inference and training.
  • Expertise in machine learning compilers or IR systems such as MLIR, Apache TVM, TensorIR, or related technologies.
  • Practical experience with GPU performance modeling, computer architecture, or accelerator-oriented software design.
  • Open-source project ownership or meaningful contributions in deep learning systems, compilers, kernels, or inference infrastructure.

What the JD emphasized

  • production-quality software
  • efficient inference
  • performance
  • scalability
  • maintainability
  • usability
  • kernels
  • LLM inference
  • generative AI
  • computer vision
  • autonomous driving
  • recommender systems
  • software abstractions
  • LLM serving engines
  • runtime systems
  • just-in-time compilation
  • code generation
  • runtime technologies
  • performance-critical GPU workloads
  • workload performance
  • tune current software
  • future software and hardware-software interfaces
  • deep learning frameworks
  • PyTorch
  • JAX
  • TensorFlow
  • ONNX
  • linear algebra
  • performance analysis
  • profiling
  • code optimization
  • software abstractions
  • APIs
  • higher-level system architecture
  • performance-sensitive systems
  • modern machine learning
  • inference system trends
  • LLMs
  • generative AI
  • GPU kernel development
  • performance optimization
  • CUDA C/C++
  • cuTile
  • Triton

Other signals

  • Develop production-quality software that ships as part of NVIDIA's AI software stack
  • Innovate and develop new AI systems technologies for efficient inference
  • Design, implement, and optimize kernels for high-impact AI workloads
  • Collaborate with world-class engineers across deep learning software, compilers, GPU architecture, and open-source inference ecosystems