Senior Systems Engineer, Artificial Intelligence Operations

NVIDIA NVIDIA · Semiconductors · Tel Aviv, Israel +2

This role focuses on building AI platform for operating AI factories, improving AI cluster resiliency, and designing AIOps-based solutions. The engineer will develop automated workflows for issue detection and root cause analysis, and collaborate with operators to debug complex AI cluster problems. The role also involves customer-facing activities like technical presentations, demos, training, and evaluation deployments.

What you'd actually do

  1. You will bring together and understand internal and external customer requirements to improve AI cluster resiliency and design AIOps-based solutions that address these needs.
  2. develop automated workflows for issue detection and root cause analysis and closely collaborate with operators to debug sophisticated, full-stack AI cluster problems.
  3. deliver compelling technical presentations and lead hands-on demos or training.
  4. You'll also handle evaluation deployments (POC/POV) and ensure smooth, reliable installations by staying engaged and encouraging throughout the customer journey.

Skills

Required

  • Bachelor of Science or equivalent experience
  • 12+ years of networking experience in enterprise or service provider environments
  • strong hands-on expertise in routing and switching
  • Proficient in scripting and automation using Python or similar languages
  • strong Linux expertise
  • Proven experience working directly with customers to resolve issues and ensure success in Systems Engineer or SRE roles
  • Exceptional oral, written, and presentation skills
  • Demonstrated ability to collaborate effectively across teams

Nice to have

  • Experience with data center infrastructure and cloud architectures
  • Background in network performance monitoring or observability
  • Previous experience working at a technological start-up

What the JD emphasized

  • 12+ years of networking experience in enterprise or service provider environments
  • strong hands-on expertise in routing and switching
  • Proven experience working directly with customers to resolve issues and ensure success in Systems Engineer or SRE roles.