Senior System Software Architect, AI and GPU Networking

NVIDIA NVIDIA · Semiconductors · Tel Aviv, Israel +1

This role focuses on architecting and optimizing NVIDIA's GPU Networking offerings for AI workloads, including distributed AI, deep learning, inference, and model serving. It involves co-designing hardware features and leading the architecture and development of new technologies and runtime systems for AI data centers.

What you'd actually do

  1. Enhance NVIDIA's GPU Networking offerings for accelerating AI workloads, such as NVIDIA Dynamo, NVIDIA NIXL and NVIDIA UCX, tailored to the unique requirements of AI workloads.
  2. Co-design hardware features (e.g., in GPUs, DPUs, or interconnects) that accelerate data movement and enable new capabilities for inference and model serving.
  3. Identify and evaluate new technologies, innovations and partner relationships for alignment with our technology roadmap and business value.
  4. Lead architecture and design of new technologies and innovations such as runtime systems, communication libraries, AI-specific technologies.
  5. Lead proof-of-concept development to evaluate and drive such technologies.

Skills

Required

  • M.Sc. or Ph.D. in Computer Science, Electrical or Computer Engineering (or equivalent experience)
  • 5+ years of industry experience in system architecture, AI systems architecture, scaling of AI, Parallelism of AI frameworks, or deep learning training workloads
  • Algorithm design
  • System programming
  • Computer architecture
  • Operating systems
  • Virtualization
  • Networking
  • Storage
  • Performance profiling and optimization techniques
  • Hardware features
  • Programming and software development skills
  • Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment

Nice to have

  • Shown research track record
  • Experience and passion for system architecture, CPU/GPU/memory/storage/networking
  • Stellar communication skills
  • Knowledge in Deep Learning frameworks and AI communication libraries (NCCL, UCX, MPI and equivalents)
  • Deep understanding of Inference and Training workloads and optimizations, like Prefill/Decode, data parallelism, Tensor parallelism, FDSP and others

What the JD emphasized

  • 5+ years of industry experience (or equivalent) in system architecture, AI systems architecture, scaling of AI, Parallelism of AI frameworks, or deep learning training workloads.
  • Deep understanding of performance profiling and optimization techniques, together with defining and using hardware features.
  • Deep understanding of Inference and Training workloads and optimizations, like Prefill/Decode, data parallelism, Tensor parallelism, FDSP and others.

Other signals

  • AI data centers
  • distributed AI and deep learning solutions
  • inference and model serving
  • GPU Networking