System Speed and Reliability Co-design Engineer

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

NVIDIA is seeking a System Speed and Reliability Co-Design Engineer to join their SCG team. This role involves co-designing system-level speed features, building validation and automation infrastructure, and debugging complex silicon issues. The engineer will collaborate cross-functionally, define system specifications, provide system requirements, and develop validation techniques. A key aspect of the role is designing and implementing automation tools for system speed modeling, applying AI and LLM-assisted workflows to improve characterization and debug cycles. The position requires experience with silicon bring-up, PPA analysis, and scripting, with a demonstrated use of AI/LLM tools in an engineering workflow.

What you'd actually do

  1. Design and implement automation tools for system speed modeling; apply AI and LLM-assisted workflows (e.g., automated log analysis, pattern detection, scripting acceleration) to compress characterization and debug cycles.
  2. Lead debug of complex silicon and system-level issues, including show-stopper defects, to enable on-time product shipment.
  3. Collaborate cross-functionally with system architects, hardware, firmware/software, process/reliability, and operations teams to co-design system-level speed features and deliver industry-defining products.
  4. Define, prototype, and refine pre- and post-silicon bring-up flows to ensure product quality, performance, and schedule efficiency.
  5. Perform closed loop validation by correlating silicon behavior against timing simulation and design expectations; provide actionable feedback to improve future designs.

Skills

Required

  • MS in EE, CE, Systems Engineering, or equivalent experience.
  • 4+ years of experience in a related hardware engineering role.
  • Hands-on experience with silicon bring-up, frequency and power characterization, PPA analysis in pre- and post-silicon phases, System/Platform level understanding, tester-to-system correlation, and lab instrumentation (oscilloscopes, multimeters, DAQs).
  • Scripting proficiency in Python and/or Perl; comfortable in Windows, Linux, and Android environments.
  • Familiarity with statistical methods and data analysis tools (JMP or equivalent).
  • Demonstrated use of AI or LLM-based tools (e.g., Claude, Copilot, ChatGPT) in an engineering workflow—scripting acceleration, log triage, data analysis—with clear judgment about output validation and where automation introduces risk.

Nice to have

  • Background in gaming, automotive, or datacenter segments.
  • Experience building or deploying AI-assisted characterization, log analysis, or debug automation workflows in a production silicon environment.
  • Familiarity with LLM evaluation, prompt engineering, or agentic scripting pipelines applied to silicon data analysis.

What the JD emphasized

  • Demonstrated use of AI or LLM-based tools (e.g., Claude, Copilot, ChatGPT) in an engineering workflow—scripting acceleration, log triage, data analysis—with clear judgment about output validation and where automation introduces risk.