Senior System Software Architect, Hpc and AI Networking

NVIDIA · Semiconductors · Beijing, China

NVIDIA is seeking a Senior System Software Architect to design and prototype scalable software systems for distributed AI training and inference, focusing on optimizing throughput, latency, and memory efficiency. The role involves developing and evaluating communication libraries, collaborating with AI framework teams, co-designing hardware features for AI acceleration, and contributing to runtime systems and protocol layers.

What you'd actually do

  1. Design and prototype scalable software systems that optimize distributed AI training and inference—focusing on throughput, latency, and memory efficiency.
  2. Develop and evaluate enhancements to communication libraries such as NCCL, UCX, and UCC, tailored to the unique demands of deep learning workloads.
  3. Collaborate with AI framework teams (e.g., TensorFlow, PyTorch, JAX) to improve integration, performance, and reliability of communication backends.
  4. Co-design hardware features (e.g., in GPUs, DPUs, or interconnects) that accelerate data movement and enable new capabilities for inference and model serving.
  5. Contribute to the evolution of runtime systems, communication libraries, and AI-specific protocol layers.

Skills

Required

  • Ph.D, Masters, or Bachelors in computer science, computer engineering, electrical engineering or a closely related field.
  • 5+ years of experience in DNNs, Scaling of DNNs, Parallelism of DNN frameworks, or deep learning training workloads.
  • Deep understanding of Inference and Training workloads and optimizations, like Prefill/Decode, data parallelism, Tensor parallelism, FDSP, etc...
  • Experience with AI network parallelism using collective libraries and RDMA/RoCE.
  • Background in algorithm design, system programming, and computer architecture.
  • Strong programming and software development skills.
  • Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.

Nice to have

  • Deep understanding of technology and passion for what you do.
  • Strong collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic matrix environment.
  • Background with designing communication middleware for high-performance computing systems, including RoCE and DPUs.
  • Background with CUDA programming and NVIDIA GPUs and programming models for emerging architectures.

What the JD emphasized

  • HPC and AI Inference Software Architect
  • distributed AI training
  • real-time inference
  • communication optimization
  • scalable AI infrastructure
  • Inference and Training workloads and optimizations
  • AI network parallelism

Other signals

  • distributed AI training
  • real-time inference
  • communication optimization
  • scalable AI infrastructure