Senior AI Networking System Architect

NVIDIA NVIDIA · Semiconductors · Tel Aviv, Israel +2

NVIDIA is seeking a Senior AI Networking System Architect to define and develop the architecture for next-generation NVL systems that power large-scale high-performance computing clusters for AI research and various industries. The role involves end-to-end system architecture, research across algorithms, software, firmware, and hardware, and developing simulation models for performance testing.

What you'd actually do

  1. Define the NVL system architecture end-to-end, by internal requirements and customers requirements through all product life cycles (post/pre silicon, on deployments).
  2. Research of various solutions to enable the next large-scale-high-performance computing clusters. The position spans over various layers from algorithms, software, firmware, and HW.
  3. Developing models for simulations and performance testing, analysing the results and development of future HW and SW.
  4. Collaborate with cross-functional teams, including other architecture teams, logic design, system software, firmware, and research teams, to ensure the successful execution of the project.

Skills

Required

  • B.Sc, M.Sc, or Ph. D degree in Computer Science, Computer Engineer, or Electrical Engineer
  • At least 5 years of industry or research experience in computer networks
  • Excellent understanding of large-scale networks behaviour and the effect of distributed computing workloads effect on the network
  • Experience in development of simulation environments
  • Possess strong managerial, problem solving and critical thinking skills
  • Ability to work and operate in a highly dynamic environment
  • Partner with multiple groups in the organization

Nice to have

  • Strong understanding in network protocols - such as InfiniBand, IP, TCP and RoCE and network topologies
  • Good knowledge in Python, C++
  • Good knowledge with AI models
  • Familiarity with HPC environments, routing algorithms, Omnet++ and NS3 simulation environments

What the JD emphasized

  • At least 5 years of industry or research experience in computer networks
  • Excellent understanding of large-scale networks behaviour and the effect of distributed computing workloads effect on the network
  • Experience in development of simulation environments
  • Good knowledge with AI models

Other signals

  • AI computing
  • ML / AI computing
  • next-generation networks
  • large-scale-high-performance computing clusters
  • AI research
  • high-performance clusters