Senior High-performance System Architect

NVIDIA · Semiconductors · Tel Aviv, Israel +2

NVIDIA is seeking a Senior High-Performance System Architect to define and research NVL system architecture for large-scale, high-performance computing clusters used to train advanced AI models. The role involves working across algorithms, software, firmware, and hardware, collaborating with cross-functional teams, and analyzing simulation results.

What you'd actually do

  1. Define the NVL system architecture end-to-end, by internal requirements and customers requirements through all product life cycles (post/pre silicon, on deployments).
  2. Research various of solutions to enable the next large-scale-high-performance computing clusters. The position spans over various layers from algorithms, software, firmware, and HW.
  3. Collaborate with cross-functional teams, including other architecture teams, logic design, system software, firmware, and research teams, to ensure the successful execution of the project.

Skills

Required

  • B.Sc, M.Sc, or Ph.D degree in Computer Science, Computer Engineer, or Electrical Engineer.
  • At least 5 years of industry or research experience in computer networks.
  • Excellent understanding of large-scale networks behavior and the effect of distributed computing workloads effect on the network.
  • Experience in developing models for simulations, analyzing simulation results and development of optimization algorithms.
  • Possess strong managerial, problem solving and critical thinking skills.
  • Ability to work and operate in a highly dynamic environment.
  • Partner with multiple groups in the organization.

Nice to have

  • Good knowledge in network protocols - such as InfiniBand, IP, TCP and RoCE and network topologies.
  • Good knowledge in Python, C++.
  • Familiarity with HPC environments, routing algorithms, Omnet++ and NS3 simulation environments.

What the JD emphasized

  • large-scale-high-performance computing clusters
  • train the most advanced AI models

Other signals

  • high-performance computing
  • AI model training
  • next-generation networks