Senior System Software Architect, AI and GPU Networking

NVIDIA NVIDIA · Semiconductors · Beijing, China +1

This role focuses on architecting and enhancing NVIDIA's GPU Networking offerings to accelerate AI workloads, including distributed AI, deep learning, inference, and model serving. It involves co-designing hardware features and leading the architecture and design of new technologies for AI data centers.

What you'd actually do

  1. Enhance NVIDIA's GPU Networking offerings for accelerating AI workloads, such as NVIDIA Dynamo, NVIDIA NIXL and NVIDIA UCX, tailored to the unique requirements of AI workloads.
  2. Co-design hardware features (e.g., in GPUs, DPUs, or interconnects) that accelerate data movement and enable new capabilities for inference and model serving.
  3. Identify and evaluate new technologies, innovations and partner relationships for alignment with our technology roadmap and business value.
  4. Lead architecture and design of new technologies and innovations such as runtime systems, communication libraries, AI-specific technologies.
  5. Lead proof-of-concept development to evaluate and drive such technologies.

Skills

Required

  • system architecture
  • AI systems architecture
  • scaling of AI
  • Parallelism of AI frameworks
  • deep learning training workloads
  • algorithm design
  • system programming
  • computer architecture
  • operating systems
  • virtualization
  • networking
  • storage
  • performance profiling and optimization techniques
  • defining and using hardware features
  • programming and software development skills
  • multi-national, multi-time-zone corporate environment communication

Nice to have

  • research track record
  • system architecture
  • CPU/GPU/memory/storage/networking
  • communication skills
  • Deep Learning frameworks
  • AI communication libraries (NCCL, UCX, MPI and equivalents)
  • Inference and Training workloads and optimizations
  • Prefill/Decode
  • data parallelism
  • Tensor parallelism
  • FDSP

What the JD emphasized

  • accelerate networking
  • AI data centers
  • distributed AI
  • deep learning solutions
  • inference
  • model serving
  • performance profiling and optimization techniques

Other signals

  • AI data centers
  • distributed AI
  • deep learning solutions
  • inference
  • model serving