Senior Debug System Engineer, Datacenter

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

NVIDIA is seeking a Senior System Debug Engineer for their datacenter product engineering team. The role focuses on failure analysis and root cause identification for GPU Server products during the New Product Introduction (NPI) phase, involving hardware, software, and firmware. The engineer will collaborate with various teams to ensure product quality and smooth transition from development to mass production.

What you'd actually do

  1. Perform failure analysis (FA) on GPU baseboards and servers at rack, system, and/or component level (including from L6 to L11/rack level).
  2. Analyze logs and failures that may span Hardware (HW), Software (SW), and Firmware (FW) and propose debug and mitigation strategies.
  3. Build experiments and collect/analyze data for Failure Analysis root cause.
  4. Provide root cause and corrective action plans in a timely manner and write clear and complete reports detailing steps taken and findings.
  5. Develop debug guides for partner teams.

Skills

Required

  • failure analysis
  • debug experience
  • motherboards
  • graphic cards
  • servers
  • PCs
  • datacenter products
  • Hardware
  • Software
  • Firmware
  • oscilloscopes
  • analyzers
  • Electrical Engineering

Nice to have

  • DFx enabling
  • characterization equipment

What the JD emphasized

  • 12+ years of working experience in a related field
  • Bachelor’s or Master’s degree in Electrical Engineering, or related field (or equivalent experience)
  • Excellent failure analysis or debug experience on motherboards, graphic cards, servers, PCs, or datacenter products
  • Proven understanding and strong skills in one or more areas: Hardware, Software, Component, Process, Test, Validation