Principal Architect, AI Networking

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +4 · Remote

This role leads the research agenda and architectural direction for NVIDIA's AI networking systems, focusing on high-performance communication at scale. It involves original research, hardware-software co-optimization, and integrating networking into AI serving stacks, with a requirement to publish findings and ship production-grade software.

What you'd actually do

  1. Setting the long-term technical vision for distributed AI communication systems—GPU-to-GPU, GPU-to-storage, and cross-node data movement.
  2. Conducting original research and prototyping next-generation networking solutions over RDMA, NVLink, and GPUDirect.
  3. Driving hardware-software co-optimization with GPU, DPU, NIC, and network switch. Investigating fundamental bottlenecks in communication runtimes for large-scale AI workloads (KV cache transfer, disaggregated prefill/decode, model parallelism).
  4. Integrating networking capabilities into AI serving stacks such as vLLM, SGLang, and TensorRT-LLM.
  5. Publishing findings, representing NVIDIA in industry forums and standards bodies, and mentoring senior engineers across the organization.

Skills

Required

  • 15+ years in systems software and/or networking with deep expertise in high-performance networking (InfiniBand, RoCE, RDMA, NVLink), communication libraries (e.g. NIXL, NCCL, UCX, MPI, NVSHMEM), and GPU accelerated systems
  • MS, PhD or equivalent experience in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
  • Deep understanding of computer architecture, memory hierarchies, DMA engines, and OS-level networking.
  • Understanding of ML systems concepts—transformer architectures, KV cache mechanics, model parallelism, or distributed training and inference patterns.
  • Proficiency in programming languages such as C, C++, Rust and Python.

Nice to have

  • Knowledge of ML inference frameworks (vLLM, SGLang, TensorRT-LLM) and their communication requirements.
  • CUDA programming and NVIDIA GPU architecture expertise.
  • Proved experience influencing product strategy and technical roadmap at a senior level.
  • Major open-source contributions.

What the JD emphasized

  • track record of defining and delivering complex, cross-team technical initiatives from research concept to production
  • Deep understanding of computer architecture, memory hierarchies, DMA engines, and OS-level networking.
  • Understanding of ML systems concepts—transformer architectures, KV cache mechanics, model parallelism, or distributed training and inference patterns.

Other signals

  • research agenda
  • architectural direction
  • publish original work
  • translate research breakthroughs into production-grade software