Senior Soc Architect, Ras Analysis and Diagnostics

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +2

Senior SoC Architect role focused on designing and implementing diagnostic testing for Reliability, Availability, Serviceability (RAS) in NVIDIA's Tegra System-on-Chips (SoCs) for datacenters and autonomous vehicles. The role involves developing methods to detect hardware failures, creating tests, assisting in failure analysis, and supporting post-silicon validation and documentation. While the company and its products are heavily involved in AI, this specific role focuses on the underlying hardware architecture and diagnostics rather than direct AI/ML model development.

What you'd actually do

  1. Work with RAS Architects to develop architecture and methods to detect faults that develop in our SOCs deployed in datacenters and autonomous vehicles.
  2. Work with software, design verification and silicon validation teams to develop tests that achieve high diagnostic coverage for the design.
  3. Assist in Failure Analysis to debug failing parts in the field and develop screens to detect and localize failures.
  4. Create Hardware specifications, testplans, and architectural models in System C (where applicable).
  5. Planning and Executing Architectural validation plans.

Skills

Required

  • MS or PhD degree in computer or electrical engineering or equivalent experience
  • SoC architecture, design and/or verification experience
  • Reliability, Availability, Serviceability (RAS) in the SoC context
  • Defining SoC architecture areas (RAS, Clocks, Resets, Debug, Automotive safety, Interconnects, Memory Controller, IO technologies, Platform integration, In-System Test)
  • Hands-on design verification experience
  • Coverage analysis and optimization methodology
  • Communicating and solving issues at all levels of architecture definition
  • Analytical, written, and verbal interpersonal skills
  • Teamwork

Nice to have

  • History of debugging hardware failures in the lab
  • Familiarity with Design for Debug, Design for Test flows
  • Past research, publications in the area RAS
  • System C or C++ development skills
  • Python, or relevant programming experience

What the JD emphasized

  • At least 8+ years of SoC architecture, design and/or verification experience
  • Good understanding of Reliability, Availability, Serviceability (RAS) in the SoC context
  • You have meaningful industry expertise in defining one or more of the following SoC architecture areas - RAS, Clocks, Resets, Debug, Automotive safety, Interconnects, Memory Controller, IO technologies, Platform integration, In-System Test.