Principal System Software Engineer - Data Center Mods

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

NVIDIA is seeking a Principal System Software Engineer to architect and scale next-generation diagnostic systems for Cloud Service Providers (CSPs), focusing on AI accelerator products. The role involves defining technical roadmaps, leading multi-functional development, and deploying robust diagnostic frameworks. Expertise in distributed systems, hardware/software interfaces, C++, Python, and system software is required.

What you'd actually do

  1. Define technical strategy and development of NVIDIA’s Data Center diagnostic systems, orchestrating large-scale stress testing for CPUs, GPUs, networking, memory, and high-speed interconnects.
  2. Mentor and grow engineering teams, providing technical leadership and encouraging a culture of innovation and excellence.
  3. Drive the root-cause analysis of systemic failures that intersect multiple hardware and software domains.
  4. Partner with CSPs to diagnose and address scalability challenges within their unique data center infrastructures.

Skills

Required

  • Bachelor's degree in Computer Science/Engineering, Electrical Engineering, or a related field (or equivalent experience)
  • 15+ years of system software experience working on highly resilient distributed systems
  • programming experience in C++ or Python
  • Deep systems knowledge of x86/ARM architectures, Linux OS internals, firmware (UEFI/BIOS), Redfish, HMC, BMC protocols and platform security
  • Expertise in software testing methodologies with an automation-led, AI-first approach to ensuring software quality

Nice to have

  • technical leadership

What the JD emphasized

  • AI accelerator products
  • AI-first approach to ensuring software quality
  • technical leadership leading project teams and setting technical direction