Failure Analysis Engineer - Power & Design

AMD AMD · Semiconductors · Secaucus, NJ · Engineering

Failure Analysis Engineer - Power & Design at AMD, focusing on GPU accelerators. Responsibilities include PCB triage, power delivery debug, board-level fault isolation, developing debug strategies, running diagnostics, and collaborating with design, validation, FW, and manufacturing teams to accelerate root cause analysis and corrective actions. The role requires strong electrical engineering fundamentals, hardware design, board bring-up, and electrical debug expertise.

What you'd actually do

  1. Support internal and external requests to troubleshoot AMD GPU product failures with primary focus on PCB triage, power delivery debug, and board-level failure isolation for continuous yield, quality, and customer support improvements.
  2. Develop and execute diagnostics and functional test DOE’s to reproduce, characterize, and isolate difficult board- and power-related failures.
  3. Develop Automation and tools to run tests and analyze results/logs.
  4. Perform structured PCB triage by narrowing failures to the board, component, power rail, layout interaction, or system integration level, and work with the contract manufacturer and internal AMD teams to reproduce failures, isolate root cause, and determine the most effective next steps for debug and corrective action.
  5. Use board schematics, layout data, and power delivery design knowledge to understand circuit behavior, trace power and signal paths, form debug hypotheses, and build targeted validation plans that drive efficient fault isolation and high-quality failure analysis.

Skills

Required

  • Python
  • shell scripting
  • Windows
  • Linux
  • firmware
  • drivers
  • hardware verification
  • system integration
  • soldering/rework
  • schematics
  • datasheets
  • component identification

Nice to have

  • GPU data center infrastructure
  • AI/ML technologies

What the JD emphasized

  • primary ownership of PCB triage and board-level fault isolation
  • strong expertise in board architecture, failure isolation, and rail bring-up
  • strong analytical mindset and are skilled at triaging complex PCB failures
  • running diagnostics and designing functional test DOE’s to reproduce and isolate hard-to-find failures
  • timely, high-quality root cause analysis and corrective actions