Senior Software Engineer, Data Center Workloads – Infrastructure

NVIDIA NVIDIA · Semiconductors · Yokneam, Israel

Senior Software Engineer focused on developing and executing software-driven characterization workflows for AI workloads on NVIDIA rack-scale systems. The role involves analyzing, characterizing, and optimizing power, performance, and drive behavior across the full stack, including GPUs, CPUs, networking, and system software. Key responsibilities include building automated frameworks for data collection and analysis, investigating system behavior, and supporting new platform bring-up.

What you'd actually do

  1. Develop and run software tools, automation, and workloads to characterize power, performance, and drive behavior across NVIDIA rack-scale systems.
  2. Execute AI and system-level workloads to stress and evaluate behavior across the stack, including GPUs, CPUs, networking, storage, firmware, drivers, and system software.
  3. Build automated frameworks for data collection, telemetry, validation, correlation, and analysis of characterization results.
  4. Investigate system behavior under different workloads and operating conditions to identify bottlenecks, anomalies, and optimization opportunities.
  5. Work closely with hardware, firmware, driver, system software, performance, and validation teams to define characterization methodologies and debug cross-stack issues.

Skills

Required

  • Python
  • C/C++
  • software engineering
  • system software
  • infrastructure
  • validation
  • performance optimization
  • automation
  • test infrastructure
  • Linux
  • scripting
  • telemetry
  • data analysis
  • debugging
  • problem-solving

Nice to have

  • NVIDIA platforms
  • GPU systems
  • rack-scale AI infrastructure
  • power characterization
  • thermal characterization
  • storage/drive characterization
  • workload automation
  • cluster orchestration
  • lab infrastructure
  • AI benchmarks
  • training workloads
  • inference workloads
  • system stress methodologies
  • post-silicon validation
  • production testing
  • system bring-up

What the JD emphasized

  • 5+ years of software engineering experience
  • Strong programming skills in Python and at least one system-level language such as C/C++
  • Experience developing automation and test infrastructure for complex hardware/software systems.
  • Hands-on experience running, debugging, or optimizing AI, HPC, or large-scale system workloads.

Other signals

  • running AI workloads
  • characterize power, performance, and drive behavior at system level
  • work at the intersection of software, infrastructure, silicon, and large-scale AI platforms
  • support bring-up, validation, and readiness activities for new rack-scale platforms and AI infrastructure