Senior Software Engineer - Server Manageability

NVIDIA NVIDIA · Semiconductors · CA +5 · Remote

This role focuses on designing, implementing, and delivering innovations for managing GPU-based AI servers, with a focus on OOB management, firmware development, server architecture, and building systems for enterprise. It involves leading BMC firmware design, developing performance-optimized monitoring solutions, and ensuring software quality and security. The role requires deep expertise in BMC firmware development, low-level hardware interfaces, and C/C++ development in embedded Linux environments, with a strong emphasis on end-to-end delivery of enterprise servers.

What you'd actually do

  1. Designing, implementing, and delivering innovations for managing GPU based AI servers with focus on OOB management, firmware development, server architecture and building systems for enterprise.
  2. Leading BMC firmware design with a global team of engineers.
  3. Designing and developing performance optimized active monitoring BMC solutions using DMTF Standards including MCTP, Redfish, SPDM and PLDM specifications.
  4. Instrumenting code to ensure maximum code coverage, writing and automating unit tests for each implemented module and maintain detailed unit test case reports.
  5. Providing software quality reports based on static analysis, code coverage, CPU load.

Skills

Required

  • BMC Firmware development
  • X86 or ARM Platforms
  • BMC-BIOS communication
  • thermal management
  • power management
  • firmware update
  • device monitoring
  • firmware security
  • end-to-end delivery of high-end enterprise servers
  • low-level interfaces between SBIOS, BMC and OS
  • I2C/SPI/PCIe/JTAG
  • PCIe enumeration
  • IO at platform level
  • working closely with HW teams, ODMs and vendors
  • C/C++ development
  • bash/python for scripting
  • debugging skills in embedded Linux
  • Bachelor’s degree, Master’s Degree, or a PhD; in Electrical Engineering or Computer Science (or equivalent experience) and 5+ years of experience

Nice to have

  • Contributor to industry standards like Open Compute, IPMI, DMTF Standards, and OpenBMC open source
  • Proven record in delivering BMC for enterprise servers with OpenBMC firmware stack

What the JD emphasized

  • Domain expertise in BMC Firmware development on X86 or ARM Platforms including BMC-BIOS communication, thermal management, power management, firmware update, device monitoring, firmware security, etc.
  • Solid experience of end-to-end delivery of high-end enterprise servers from definition to customer deployment.
  • Solid understanding of low-level interfaces between SBIOS, BMC and OS like I2C/SPI/PCIe/JTAG etc. PCIe enumeration, IO at platform level for enterprise systems.
  • Experience working closely with HW teams, ODMs and vendors to introduce and support server platforms.
  • Experience with C/C++ development, bash/python for scripting, and debugging skills in embedded Linux operating environments.