Manager, Software Engineering - Nccl

NVIDIA NVIDIA · Semiconductors · Shanghai, China

Manager, Software Engineering for GPU Communications Libraries (NCCL) at NVIDIA, focusing on Deep Learning and HPC. The role involves leading a China-based engineering team, managing execution, customer interaction, roadmap definition, and contributing to feature design. Requires significant experience in software industry management, systems software, networking, and C/C++ programming.

What you'd actually do

  1. Lead, mentor, and grow our China engineering team. Own the end-to-end execution spanning planning, prioritization, quality control and performance.
  2. Interact with customers and researchers to understand their use cases and requirements. Collaborate with engineering, program and product management, and partners to define the product roadmap.
  3. Contribute to feature design and implementation.
  4. Continuously review and identify improvement opportunities in established processes, infrastructure, and practices to ensure the teams are accomplishing work in the most efficient and transparent manner.

Skills

Required

  • software industry management
  • systems software
  • communication runtimes
  • high performance networking
  • computer systems architecture
  • networking technologies
  • operating systems principles
  • HW-SW interactions
  • performance analysis/optimizations
  • C/C++ programming
  • debugging skills
  • Linux

Nice to have

  • Active user or developer of NCCL
  • Customer engagement experience
  • parallel programming models
  • communication runtime
  • CUDA
  • MPI
  • OpenMP
  • OpenACC
  • pthreads
  • HPC fundamentals
  • ML/DL fundamentals
  • Deep Learning Frameworks

What the JD emphasized

  • 10+ overall years of experience in the software industry with 4+ years of management experience
  • Bachelors, Masters, or Ph.D. in CS, CE, EE (related technical field) or equivalent experience
  • Specialization in systems software, communication runtimes, or high performance networking
  • Proven success in managing several complex initiatives or products through the full product life cycle
  • Strong understanding of computer systems architecture, networking technologies (RDMA, RoCE, Ethernet, EFA, InfiniBand) and topologies, operating systems principles (aka systems software fundamentals), HW-SW interactions and performance analysis/optimizations
  • Hands-on C/C++ programming and debugging skills in Linux