Senior Software Engineer, Cuda Core Libraries

NVIDIA NVIDIA · Semiconductors · Germany +1 · Remote

Senior Software Engineer to work on CUDA Core Libraries (CCCL, cuda-python, numba-cuda) that power GPU computing for C++ and Python developers. Focus on foundational libraries, algorithms, and language/runtime infrastructure for deep learning, scientific computing, and data analytics.

What you'd actually do

  1. Develop and implement CUDA Core Libraries in C++ and/or Python, including parallel algorithms and idiomatic language bindings for core CUDA functionality.
  2. Compose, optimize, and evolve GPU algorithms and APIs, from high-level interfaces down to low-level performance tuning involving memory, parallelism, and synchronization.
  3. Own features end-to-end: develop, implementation, testing, benchmarking, documentation, and long-term maintenance.
  4. Improve developer experience across the stack: CI, tests, benchmarks, packaging, examples, and docs.
  5. Collaborate with senior CUDA engineers in design reviews, code reviews, and open-source-style workflows.

Skills

Required

  • C++
  • Python
  • systems-level software
  • performance
  • memory
  • concurrency
  • API design
  • modern C++
  • Python library development
  • parallel programming
  • heterogeneous programming
  • CUDA
  • OpenMP
  • GPU-accelerated Python
  • production software contribution
  • open-source libraries contribution
  • testing
  • profiling
  • code review
  • independent work
  • problem scoping
  • project completion
  • technical documentation
  • multi-language codebases
  • CMake
  • Pixi
  • CI systems

Nice to have

  • CPU/GPU architecture
  • CUDA C++
  • CUDA Python
  • PyTorch
  • JAX
  • Numba
  • CuPy
  • Thrust
  • CUB
  • libcudacxx
  • compiler infrastructure
  • LLVM
  • Clang tooling
  • MLIR
  • developer tools
  • library design

What the JD emphasized

  • Minimum of 8+ years of related development experience
  • Strong programming skills in C++, Python, or both, with proven interest in systems-level software (performance, memory, concurrency, API design).
  • Practical experience with parallel or heterogeneous programming (CUDA, OpenMP, GPU-accelerated Python, or similar).
  • Experience contributing to production software or open-source libraries, including testing, profiling, and code review.