Senior Software Engineer, Nccl

NVIDIA NVIDIA · Semiconductors · Shanghai, China

Senior Software Engineer role focused on designing, implementing, and maintaining highly-optimized communication runtimes for Deep Learning frameworks and HPC programming interfaces on GPU clusters. This involves system software development, parallel programming interface contributions, and proof-of-concept creation for new designs and hardware features.

What you'd actually do

  1. Design, implement and maintain highly-optimized communication runtimes for Deep Learning frameworks (e.g. NCCL for TensorFlow/Pytorch) and HPC programming interfaces (e.g. UCX for MPI/OpenSHMEM) on GPU clusters.
  2. Participating in and contributing to parallel programming interface specifications like MPI/OpenSHMEM.
  3. Design, implement and maintain system software that enables interactions among GPUs and interactions between GPUs and other system components.
  4. Creating proof-of-concepts to evaluate and motivate extensions in programming models, new designs in runtimes and new features in hardware.

Skills

Required

  • C/C++ programming
  • Linux
  • computer system architecture
  • operating systems
  • parallel programming interfaces
  • communication runtimes

Nice to have

  • high-performance networks (InfiniBand, RoCE)
  • HPC applications
  • Deep Learning Frameworks (PyTorch, TensorFlow, JAX/XLA, vLLM/SGLang)
  • AI/DL communication patterns (EP, TP, DP, PP)
  • CUDA kernel optimization and profiling
  • large-scale model training
  • production inference software stack

What the JD emphasized

  • Excellent C/C++ programming and debugging skills.
  • Expert understanding of computer system architecture and operating systems.
  • Experience with parallel programming interfaces and communication runtimes.

Other signals

  • GPU clusters
  • Deep Learning frameworks
  • HPC programming interfaces
  • parallel programming
  • communication runtimes