Failure Analysis Engineering Manager, GPU Asic and Pcba Debug

AMD AMD · Semiconductors · Secaucus, NJ · Engineering

AMD is seeking an experienced Failure Analysis Engineering Manager for their GPU ASIC and PCBA Debug team. This role involves leading and developing a team of FA engineers, overseeing customer and factory failure investigations, and driving root cause analysis and corrective actions. The manager will provide technical leadership for debug and triage of complex GPU and PCBA failures, drive debug automation, and ensure clear documentation of findings. The role requires strong people management skills, technical expertise in GPU ASIC and board-level failure analysis, and experience in hardware verification and system integration.

What you'd actually do

  1. Provide technical leadership for triage and debug of complex GPU and PCBA failures across power, ASIC, firmware, and thermals, guiding the FA team to root cause.
  2. Lead failure reproduction and triage by defining debug plans, directing investigations, and guiding experiments and escalation paths for complex issues.
  3. Drive debug automation, diagnostic tools, and data analysis methods that improve triage efficiency and consistency across failure domains.
  4. Lead cross-functional triage with manufacturing partners and AMD teams to align on failure hypotheses, reproduction, and root cause.
  5. Guide board-level debug using schematics, layouts, and design documentation to direct analysis and mentor engineers through the process.

Skills

Required

  • GPU ASIC debug
  • PCBA diagnostics
  • failure analysis
  • board-level debug
  • Python
  • shell scripting
  • Windows
  • Linux
  • schematics
  • datasheets
  • management experience

Nice to have

  • firmware
  • drivers
  • hardware interactions
  • hardware verification
  • system integration
  • failure reproduction
  • high-speed digital design
  • HBM or GDDR memory
  • PCIe
  • GPU data center systems