Senior Systems Software Engineer, Cuda Driver - Multi-node and Memory Model

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +2 · Remote

NVIDIA is seeking a Senior Systems Software Engineer to work on the CUDA Driver, focusing on multi-node scalability and memory models for next-gen AI applications. The role involves architecting and implementing new features, coordinating development, and improving the CUDA platform.

What you'd actually do

  1. Evangelize, architect, and implement new features related to CUDA’s memory model and multi-node scalability geared towards next-gen AI applications and deployments
  2. Coordinate and drive development efforts across multiple teams
  3. Help define forward-looking improvements to the CUDA APIs and programming model
  4. Write effective, maintainable, and well-tested code
  5. Develop code for multiple operating systems

Skills

Required

  • BS or MS degree in Computer Science, Electrical Engineering or related field (or equivalent experience)
  • Strong C and C++ programming skills
  • Minimum of 8 years of related development experience
  • Experience driving projects across multiple teams
  • Experience working with large codebases
  • Background with operating system interfaces for threads, process control, and virtual memory
  • Experience writing and debugging multithreaded programs
  • Good written communication as well as presentation skills

Nice to have

  • Prior experience with parallel computing, PyTorch, low-latency AI inference
  • Understanding of system level architecture, such as interconnects, memory hierarchy, interrupts, and memory-mapped IO
  • Knowledge of memory coherence and consistency models
  • Background with kernel mode development
  • Experience with Linux, or Windows Systems Software development

What the JD emphasized

  • Minimum of 8 years of related development experience
  • Experience driving projects across multiple teams
  • Background with operating system interfaces for threads, process control, and virtual memory
  • Experience writing and debugging multithreaded programs