Lead System Software Engineer Platform - Server Embedded Firmware

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Lead System Software Engineer focusing on embedded firmware for GPU server platforms, specifically microcontroller firmware development using C/C++ in an RTOS environment. Responsibilities include designing, implementing, debugging, and optimizing firmware for server manageability, communication protocols, and system integration, collaborating with hardware and security teams.

What you'd actually do

  1. Design and implement Microcontroller Firmware for GPU Server platforms, focusing on but not limited to ARM M-class microcontrollers.
  2. Develop C/C++ server manageability features in an RTOS embedded-optimized environment.
  3. Perform hands-on work with microcontroller firmware bring-up, debugging, performance analysis, and coding manageability features for NVIDIA’s Server platforms.
  4. Develop embedded management software to enable reporting and connectivity between server management devices.
  5. Implement register-based communication and DMTF standard messaging protocols for seamless interaction between BMC, GPUs, switches, memory, I/O expanders, sensors, and local microcontroller peripherals.

Skills

Required

  • Bachelor of Science Degree (or higher) in Electrical Engineering or Computer Science or equivalent experience
  • 12+ years of experience in low level microcontroller Firmware development on embedded microcontrollers using Zephyr or FreeRTOS
  • Developing BMC and/or microcontroller firmware for managing CPU, GPU, Network and Storage Devices
  • embedded interfaces - USB and I3C
  • ARM Integrated Development Environments (IDE), debuggers, logic and protocol analyzers, and oscilloscopes
  • interrupt schemes, multi-threading, DMA, memory management, and working in resource restricted embedded environments
  • embedded programming and scripting skills using C/C++, Bash, Python, Go
  • reviewing and using hardware schematics, reference manuals, and datasheets
  • server manageability protocols such as MCTP, PLDM, SPDM, SMBUS, and OCP recovery
  • Linux fundamentals

Nice to have

  • microcontroller embedded firmware development and OOB management
  • implementing MCTP stack in embedded environments or FPGA
  • Contributor to industry groups like Open Compute, OpenBMC, DMTF and open source
  • system software and platform security for x86/ARM based Rack/Blade server systems

What the JD emphasized

  • 12+ years of experience in low level microcontroller Firmware development on embedded microcontrollers using Zephyr or FreeRTOS
  • Demonstrated experience in developing BMC and/or microcontroller firmware for managing CPU, GPU, Network and Storage Devices.
  • Expertise working with server manageability protocols such as MCTP, PLDM, SPDM, SMBUS, and OCP recovery.