Systems Quality and Reliability Engineer - Lpu

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

This role focuses on systems quality and reliability engineering for NVIDIA's AI/ML products, involving RMA and FA debug, root-cause analysis, trend identification, and quality alert management. The engineer will oversee hardware quality performance, manage operational performance of FA at CMs, and oversee the setup of new products into Failure Analysis operations. While the role supports AI/ML products, the core function is in hardware quality and reliability, not direct AI/ML model development or deployment.

What you'd actually do

  1. Conduct and lead debug and root-cause analysis of field RMAs. Collaborate with Systems Engineers, Hardware engineers, Software engineers, and operations engineers as required
  2. Scale root cause FA capabilities within your organization
  3. Create FA result reports that align with standard 8D or similar process
  4. Analyze RMA, FA and repair data. Identify trends and raise quality alerts when necessary. Drive resolution, containment, and mitigation plans for such quality alerts
  5. Oversee hardware quality performance, monitoring field quality data and associated metrics including RMA rates, MTBF, and Reliability Ratio

Skills

Required

  • BS/MS in EE, Physics or a related degree (or equivalent experience)
  • 5+ yrs of hands on systems test and/or validation engineering experience
  • Proven hands-on experience in systems quality and reliability Engineer
  • Competence using lab equipment such as oscilloscopes, logic analyzers, power analyzers etc.
  • Experience with enabling reliability tests such as HTOL and quality tests such as Burn in
  • Strong knowledge of Fault isolation techniques such as OBIRCH, DLS/LADA, LVP and LVI
  • Proficiency with high speed interfaces (SerDes, PCIe, DDR)
  • Proficiency in Python, PERL, C++, or other languages on UNIX /Linux
  • Excellent knowledge of PCB card and system level test and debug as well as be able to manage factory floor partners (CMs) for RMA/FA activities

Nice to have

  • working knowledge of FA techniques and tools such as FIB, SEM, TDR, VNA and CSAM

What the JD emphasized

  • 5+ yrs of hands on systems test and/or validation engineering experience
  • Proven hands-on experience in systems quality and reliability Engineer