Senior Software Engineer, Cuda Core Libraries

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

Senior Software Engineer to work on foundational CUDA Core Libraries for GPU computing, focusing on C++ and Python developers. This role involves developing, implementing, optimizing, and maintaining parallel algorithms and APIs for deep learning, scientific computing, and data analytics.

What you'd actually do

  1. Develop and implement CUDA Core Libraries in C++ and/or Python, including parallel algorithms and idiomatic language bindings for core CUDA functionality.
  2. Compose, optimize, and evolve GPU algorithms and APIs, from high-level interfaces down to low-level performance tuning involving memory, parallelism, and synchronization.
  3. Own features end-to-end: develop, implementation, testing, benchmarking, documentation, and long-term maintenance.
  4. Improve developer experience across the stack: CI, tests, benchmarks, packaging, examples, and docs.
  5. Collaborate with senior CUDA engineers in design reviews, code reviews, and open-source-style workflows.

Skills

Required

  • C++
  • Python
  • Systems-level software
  • Parallel programming
  • Heterogeneous programming
  • CUDA
  • OpenMP
  • GPU-accelerated Python
  • Production software contribution
  • Open-source libraries contribution
  • Testing
  • Profiling
  • Code review
  • Independent work
  • Problem scoping
  • Project completion
  • Technical design documentation
  • Large codebases navigation
  • CMake
  • Pixi
  • CI systems

Nice to have

  • CPU/GPU architecture understanding
  • CUDA C++
  • CUDA Python
  • PyTorch
  • JAX
  • Numba
  • CuPy
  • GPU-accelerated stacks
  • Thrust
  • CUB
  • libcudacxx
  • Modern C++/GPU libraries
  • Compiler infrastructure
  • Tooling
  • LLVM
  • Clang tooling
  • MLIR
  • Developer tools
  • Library design

What the JD emphasized

  • Minimum of 8+ years of related development experience
  • Strong programming skills in C++, Python, or both, with proven interest in systems-level software (performance, memory, concurrency, API design).
  • Solid understanding of modern C++ (templates, generics, standard library) and/or Python library development and packaging.
  • Practical experience with parallel or heterogeneous programming (CUDA, OpenMP, GPU-accelerated Python, or similar).
  • Experience contributing to production software or open-source libraries, including testing, profiling, and code review.