Senior Software Engineer - Verification AI Infrastructure

at NVIDIA · Industrial · Tel Hai, Israel

Senior Software Engineer focused on building and optimizing scalable software automation systems with AI/ML integration for NVIDIA's Data Center environments. The role involves developing automation and validation tools, improving system performance, and troubleshooting complex issues in distributed systems.

What you'd actually do

  1. Build, develop, and optimize scalable software systems with a focus on AI/ML integration
  2. Build automation and validation tools simulating data center and HPC environments
  3. Collaborate with cross-functional teams to define requirements and deliver robust solutions
  4. Improve system performance, scalability, and reliability through architectural improvements
  5. Troubleshoot complex issues in distributed systems and improve observability

Skills

Required

  • Python
  • C++
  • PyTorch
  • TensorFlow
  • Data structures
  • Algorithms
  • System building
  • Linux environments
  • Debugging
  • Problem-solving
  • Communication skills

Nice to have

  • embedded programming
  • low-level C/C++
  • networking protocols
  • TCP/IP
  • UDP
  • optimizing AI workloads
  • distributed environments
  • edge environments
  • high-performance systems
  • AI-related systems

What the JD emphasized

  • 5+ years of hands-on software development experience
  • Experience with AI/ML frameworks such as PyTorch or TensorFlow

Other signals

  • AI/ML integration
  • AI workloads
  • AI frameworks
Read full job description

NVIDIA is seeking a dedicated Software Engineer to join the Vertical Verification Group. As a senior team member, you will help craft high-performing software automation systems for NVIDIA's Data Center environments. The role focuses on software development and solutions involving AI. You will collaborate with NIC, OS, Switch, HCA, CPU, and GPU compute teams, working closely with architects, network engineers, and developers. With skilled engineers worldwide, the work environment is dynamic, meaningful, and fast-paced. Are you ready for the challenge?

What you’ll be doing:

  • Build, develop, and optimize scalable software systems with a focus on AI/ML integration
  • Build automation and validation tools simulating data center and HPC environments
  • Collaborate with cross-functional teams to define requirements and deliver robust solutions
  • Improve system performance, scalability, and reliability through architectural improvements
  • Troubleshoot complex issues in distributed systems and improve observability
  • Participate in code reviews, build discussions, and continuous improvement efforts

What we need to see:

  • B.Sc. in Computer Science, Engineering, or a related field (or equivalent experience)
  • 5+ years of hands-on software development experience
  • Strong programming skills in Python, C++, or similar languages
  • Experience with AI/ML frameworks such as PyTorch or TensorFlow
  • Solid understanding of data structures, algorithms, and system building
  • Experience working in Linux environments
  • Strong debugging, problem-solving, and communication skills

Ways to stand out from the crowd:

  • Experience with embedded programming (low-level C/C++)
  • Familiarity with networking protocols (TCP/IP, UDP, etc.)
  • Experience optimizing AI workloads in distributed or edge environments
  • Contributions to high-performance or AI-related systems