Principal Firmware Engineer – Server Manageability and Observability

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

NVIDIA is seeking a Principal Firmware Engineer to lead the end-to-end system software architecture for their data center systems, including firmware, kernel drivers, and operating systems. The role involves technical leadership, customer engagement, and strategic collaboration with hyperscalers to architect next-generation products.

What you'd actually do

  1. Serve as the primary technical point of contact for major customers, leading technological discussions, defining KPIs, gathering requirements, and addressing complex technical queries.
  2. As a system software architect, lead technical innovation and strategic collaborations with major hyperscalers to architect next-generation data center products.
  3. Align NVIDIA's roadmap with major customers' requirements through direct engagement.
  4. Develop and drive adoption of new technologies and protocols.
  5. Make critical technical decisions in ambiguous situations, mitigating risks through left-shift strategies.

Skills

Required

  • System architecture
  • Firmware
  • Embedded systems
  • Linux kernel
  • Server management protocols
  • Networking technologies
  • Cross-functional project leadership

Nice to have

  • Cloud and cluster level deployment and management systems
  • OCP and DMTF standards
  • NVIDIA HPC programming models and libraries (CUDA, cuDNN, DOCA)
  • Enterprise storage architectures
  • Distributed parallel processing paradigms

What the JD emphasized

  • Deep expertise in scalable and performant server system architecture, focusing on SW/HW interfaces.
  • Extensive experience with complex system software for accelerators (GPUs, DPUs, FPGAs).
  • Mastery of system firmware (SBIOs, OpenBMC), embedded systems, and Linux kernel internals.
  • Proficiency in Out-of-Band and In-Band management architectures, device management protocols (e.g., MCTP, PLDM, SPDM, RDE) and system management protocols (Redfish, IPMI).
  • Extensive knowledge of networking technologies and protocols, including TCP/IP, Ethernet, InfiniBand, as well as advanced switching and routing concepts
  • Experience collaborating with platform security experts to define tradeoffs between security and ease of use.
  • Demonstrated success in leading complex, cross-functional projects to completion, showcasing the ability to influence and achieve results without direct authority in large-scale, collaborative environments.
  • Demonstrable experience in implementing left shift strategy to de-risk program execution.
  • 15+ years in the area of System architecture and design.