Principal Firmware Engineer - Data Center Server Management

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

NVIDIA is seeking a Principal Firmware Engineer to own end-to-end manageability architecture for data center server management solutions, specifically for next-generation scaling AI supercomputing platforms. The role involves driving server management for large GPU clusters, collaborating with internal and external teams to define requirements, and ensuring the quality, reliability, and telemetry performance of firmware delivered to data centers.

What you'd actually do

  1. Drive server management for large clusters and data centers deploying GPUs and Grace solution from Nvidia.
  2. Work with data center architects and cloud customers to narrow down on requirements for implementation to ensure speed of light product development.
  3. Work with internal teams to make sure requirements are designed and implemented in right way with each firmware and software module
  4. Collaborate with other leads to design & build data center health management workflow.
  5. Drive reliability and optimization in firmware architecture from a data center view point.

Skills

Required

  • server firmware (BMC)
  • platform software development
  • data center health management workflow
  • server architecture
  • server manageability
  • C/C++
  • Python
  • programming and debugging skills for server platforms
  • SCM (e.g. Git, Perforce)
  • project management tools like Jira

Nice to have

  • x86 or ARM system architecture
  • technical leadership

What the JD emphasized

  • 15+ years of relevant experience working on server firmware (BMC) and platform software development
  • BS, MS, or PhD in EE/CS or related field of education or equivalent experience
  • Hands on experience with data center health management workflow.
  • Proven record of delivering server firmware for large data centers..
  • Strong knowledge of data center management, server architecture and server manageability in data centers and strong and demonstrable skill in C/C++ and Python
  • Experience programming and debugging skills for server platforms.
  • Experience in SCM (e.g. Git, Perforce) and project management tools like Jira.