Senior Math Libraries Engineer - Sparsity in AI

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

Software engineer to design and develop C++ libraries and tools for unstructured sparsity in Deep Learning (DL) and High-Performance Computing (HPC) on NVIDIA GPUs. This involves DSL specifications, on-demand code generation, and enabling the system in Python/PyTorch. The role focuses on performance evaluation, library quality, and collaboration with product management.

What you'd actually do

  1. Design and develop a C++-based system to simplify and accelerate computing for unstructured sparsity in DL and HPC on NVIDIA GPUs
  2. Enable the system in languages and frameworks that are more commonly used in DL, such as Python and PyTorch
  3. Evaluate and improve the performance of the system on real-life applications
  4. Realize opportunities to improve library quality, performance and maintainability by writing effective and well-tested code for production use
  5. Work closely with product management and other internal and external partners to understand feature and performance requirements and contribute to technical roadmaps

Skills

Required

  • C++
  • parallel programming
  • CUDA
  • MPI
  • OpenMP
  • domain-specific language design
  • compiler optimizations
  • MLIR
  • TACO
  • Python

Nice to have

  • sparse linear algebra applications
  • LLMs
  • Deep Learning methods and frameworks
  • low-level GPU performance optimization
  • numerical linear algebra methods
  • CI/CD systems
  • JIRA

What the JD emphasized

  • 6+ years of overall experience in developing, debugging and optimizing high-performance software using C++ and parallel programming
  • Experience with domain-specific language design and compiler optimizations, in particular sparse compilers (MLIR or TACO)
  • Excellent C++, Python, and CUDA programming skills

Other signals

  • NVIDIA GPUs
  • unstructured sparsity
  • DL frameworks like PyTorch
  • high-performance linear algebra libraries