Senior Software Advanced Developer

NVIDIA · Semiconductors · Yokneam, Israel +2

Develop and prototype advancements in distributed training and inference using NVIDIA's Spectrum-X AI fabric, focusing on improving AI app-networking connections through communication refinement, congestion control, NIC firmware coding, and switch SDK features to enhance AI factory efficiency and large-scale AI system development, scaling, and speed.

What you'd actually do

  1. Prototype end-to-end solutions to improve distributed training and disaggregated inference performance.
  2. Analyze and optimize communication flows across application, transport, and network layers.
  3. Develop system software spanning communication libraries, drivers, and firmware integrations.
  4. Collaborate with hardware, firmware, and SDK teams to co-design network features.
  5. Validate and integrate prototypes into NVIDIA’s AI infrastructure and products.

Skills

Required

  • BSc/MSc/PhD in Computer Science or Electrical Engineering
  • 5+ years of relevant experience
  • Deep understanding of networking and communication internals (NCCL, RDMA/RoCE, congestion control)
  • Hands-on experience with HW/SW/FW integration
  • Low-level programming (C/C++, kernel, drivers)
  • Background in distributed training systems (PyTorch DDP, Megatron-LM, DeepSpeed)

Nice to have

  • Demonstrated innovation and leadership turning prototypes into impactful product features
  • Experience with programmable data planes (P4, eBPF, DOCA SDK, or switch SDKs)
  • Familiarity with NIC firmware scheduling, in-network compute, or congestion management
  • Contributions to open-source projects, academic papers, or performance benchmarking tools
  • Strong background in AI factory architectures, distributed inference, or network telemetry

What the JD emphasized

  • Deep understanding of networking and communication internals — NCCL, RDMA/RoCE, congestion control.
  • Hands-on experience with HW/SW/FW integration and low-level programming (C/C++, kernel, drivers).
  • Background in distributed training systems (such as PyTorch DDP, Megatron-LM, DeepSpeed).

Other signals

  • distributed training
  • inference performance
  • AI fabric
  • large-scale AI systems