System Administrator – Advanced Data Center and AI Infrastructure

NVIDIA NVIDIA · Semiconductors · Yokneam, Israel

NVIDIA is seeking a System Administrator to deploy and maintain bare-metal and multi-node environments running NVIDIA networking, DGX, and advanced computing systems. The role focuses on firmware validation infrastructure, BMC management, regression lab automation, and ensuring the continuous availability of critical test platforms. The ideal candidate has deep Linux expertise, experience with infrastructure-as-code and monitoring at scale, and a proven background in automating system administration tasks for AI and HPC infrastructure.

What you'd actually do

  1. Deploy, configure, and maintain NVIDIA DGX, GB, and HPC systems within our data center.
  2. Monitor and ensure system health through preventive maintenance, upgrades, patching, and resolving issues in both physical and virtual environments.
  3. Implement and update automation for efficient AI and HPC administration via Bash and Python scripting.
  4. Lead integration, onboarding, and optimization for new hardware and edge technologies alongside cross-functional teams.
  5. Provide technical support and collaborate to enable rapid deployment and system bring-up of new technologies.

Skills

Required

  • Linux server environments
  • NVIDIA DGX and GB, or HPC clusters
  • system architecture
  • networking fundamentals
  • enterprise storage operations
  • automating system administration tasks
  • Bash scripting
  • Python scripting

Nice to have

  • cluster management
  • platform monitoring
  • high-performance and GPU-accelerated environments
  • rack installation
  • high-density physical infrastructure
  • scalable solutions
  • troubleshooting skills

What the JD emphasized

  • Minimum 3+ years' experience as a System Administrator handling large-scale data center, HPC, or AI infrastructure deployments.
  • Clear experience in automating system administration tasks and improving workflows for AI and HPC infrastructure.