Senior Deep Learning Software Engineer

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

Senior Deep Learning Software Engineer to design and build an automated inference and deployment solution with a scalable architecture focusing on ease-of-use and compute efficiency. The role involves developing features in high-level frameworks, implementing a high-performance execution environment, and low-level GPU optimizations.

What you'd actually do

  1. Play a pivotal role in defining of a modular, scalable platform to seamlessly bridge training and deployment workflows—enabling tight integration of deployment tooling with training frameworks such as Megatron and Nemo
  2. Leverage and build upon the torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc...) to analyze and extract standardized model graph representation from arbitrary torch models for our automated deployment solution.
  3. Develop support for inference optimization techniques such as speculative decoding and LoRA.
  4. Collaborate with teams across NVIDIA to use performant kernel implementations within the automated deployment solution.
  5. Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.

Skills

Required

  • Masters, PhD, or equivalent experience in Computer Science, AI, Applied Math, or related field
  • Python
  • PyTorch
  • ML tools
  • algorithms
  • programming fundamentals
  • software design
  • debugging
  • performance analysis
  • test design
  • communication skills

Nice to have

  • Contributions to PyTorch, JAX, or other Machine Learning Frameworks
  • GPU architecture
  • compilation stack
  • end-to-end performance debugging
  • NVIDIA's deep learning SDKs (TensorRT)
  • CUDA
  • CUTLASS
  • Triton

What the JD emphasized

  • 8+ years of relevant work or research experience in Deep Learning
  • Excellent software design skills, including debugging, performance analysis, and test design
  • Strong proficiency in Python, PyTorch, and related ML tools
  • Strong algorithms and programming fundamentals
  • Prior experience in writing high-performance GPU kernels for machine learning workloads in frameworks such as CUDA, CUTLASS, or Triton

Other signals

  • DL inference
  • deployment solution
  • GPU optimizations
  • inference performance