Senior System Reliability Engineer

NVIDIA NVIDIA · Semiconductors · Taipei, Taiwan

NVIDIA is seeking a Senior System Reliability Engineer in Taiwan to focus on the reliability of printed circuit board assemblies (PCBAs) and server-level products, including HGX/DGX AI servers/racks. The role involves collaborating with partners and engineering teams, defining reliability standards, developing and executing reliability test plans, performing failure analysis, and correlating test results with field performance. Requires a BS/MS in a technical field, 5+ years of hardware validation and system-level reliability experience, hands-on testing experience, understanding of system interfaces, statistical analysis skills, and project management abilities. Fluency in Chinese and English is essential.

What you'd actually do

  1. Collaborate closely with CM/ODM partners, with extensive interaction across relevant engineering teams and suppliers to ensure target reliability is achieved through Design for Reliability (DfR) methodologies, including FMEA and DoE.
  2. Define, implement, and maintain product reliability standards and metrics for NVIDIA’s next-generation system technologies, using existing tools and processes or developing new ones as needed.
  3. Develop and own the reliability test plan. Lead and complete reliability testing, including failure analysis, and provide actionable recommendations to improve product design and manufacturing processes.
  4. Develop and present methodologies to correlate reliability test results with real-world field performance.

Skills

Required

  • BS or MS in Electrical, Mechanical, Computer Engineering, or other technical fields
  • Minimum of 5 years of experience in hardware validation and system-level reliability for printed circuit boards and server platforms
  • Hands-on experience with reliability demonstration testing and accelerated life methodologies
  • Solid understanding of power delivery, memory subsystems, high-speed I/O, PCI Express, Ethernet, and I²C interfaces
  • Strong proficiency in statistical analysis and reliability modeling
  • Demonstrated project management skills
  • Fluency in both Chinese and English

Nice to have

  • advanced degree preferred
  • existing tools and processes or developing new ones as needed
  • present technical information clearly to diverse audiences
  • Highly self-motivated, capable of working independently, and driven to deliver results in a fast-paced environment

What the JD emphasized

  • Minimum of 5 years of experience in hardware validation and system-level reliability for printed circuit boards and server platforms.
  • Hands-on experience with reliability demonstration testing and accelerated life methodologies, including Thermal Cycling, Mechanical Shock and Vibration, ALT/HALT/HASS, Burn-in, and Ongoing Reliability Testing (ORT) at component, subassembly, and system levels.