Senior Math Libraries Engineer – Emulation in AI and Hpc

NVIDIA NVIDIA · Semiconductors · France +2 · Remote

NVIDIA is seeking software engineers to join their math libraries teams focused on AI and HPC kernel generation. The role involves designing and implementing high-performance numerical linear algebra software on GPUs, with a focus on emulating math operations across different precisions. This position requires strong CUDA and C++ experience, knowledge of GPU architecture, and fundamentals in finite precision arithmetics and numerical methods for linear algebra. While the role supports AI and HPC, the core craft is in the foundational math libraries and kernel generation, not direct AI model development or deployment.

What you'd actually do

  1. Scoping, designing, and implementing high quality and performance numerical dense linear algebra software on GPUs.
  2. Providing technical leadership and feedback to library engineers working with you on projects and sometimes mentor interns.
  3. Working closely with product management and other internal and external customers to understand feature and performance requirements and help define the technical roadmaps of libraries.
  4. Finding opportunities to improve library performance and reduce code maintenance overhead through re-architecting.

Skills

Required

  • PhD or Master’s degree in Computer Science, Applied Math, or related science or engineering field of study (or equivalent experience).
  • 5+ years of experience in designing, developing, testing, maintenance, and performance optimization of production software using CUDA and C++.
  • Strong fundamentals in finite precision arithmetics and numerical methods for linear algebra.

Nice to have

  • Good knowledge of GPU (preferred) or CPU hardware architecture.
  • Experience with CUTLASS, or low level programming like assembly for performance optimization is a huge plus.
  • A scripting language, preferably Python.
  • Experience with working in a globally-distributed team.

What the JD emphasized

  • production software
  • performance optimization
  • CUDA
  • C++
  • GPU
  • finite precision arithmetics
  • numerical methods for linear algebra