AI Scale-up Switch System Design Engineer

AMD AMD · Semiconductors · Secaucus, NJ · General Management/ Administration/ Support

AMD is seeking a system design engineer to lead the bring-up and validation of AI scale-up switches for next-generation AI rack infrastructure. The role involves deep expertise in high-speed Ethernet, server management, and platform validation, with responsibilities including link bring-up, performance testing, and debugging complex system-level failures. The engineer will collaborate with hardware, firmware, and software teams to deliver production-ready systems and develop test automation scripts.

What you'd actually do

  1. Lead the system bring-up and validation of state-of-the-art AI scale-up switches purpose-built for high-density GPU compute racks, from initial power-on through full system validation
  2. Perform high-speed SerDes and link bring-up, including configuring and validating Auto-Negotiation/Link Training (AN/LT), tuning TX equalization, and characterizing signal integrity across 200G/400G/800G interfaces
  3. Execute comprehensive link qualification testing using PRBS (Pseudo-Random Binary Sequence), Snake Traffic loopback testing, and FEC (Forward Error Correction) analysis to validate BER performance at scale
  4. Utilize LinkCAT and Broadcom SDK tools to characterize port performance, diagnose link failures, and validate PHY configurations across large port counts
  5. Integrate and validate server management infrastructure including BMC/IPMI, Redfish API, and out-of-band management workflows for automated bring-up and health monitoring

Skills

Required

  • Python for test automation
  • debugging skills
  • communication skills

Nice to have

  • high-speed networking silicon characterization
  • Broadcom SDK/DAPI frameworks
  • high-speed Ethernet standards (400GbE, 800GbE)
  • AN/LT (IEEE 802.3)
  • RS-FEC / KP4-FEC
  • PAM4 SerDes technology
  • PRBS testing
  • BER measurement
  • eye diagram analysis
  • Snake/loopback traffic validation methodologies
  • LinkCAT or equivalent PHY/link characterization tools
  • IPMI
  • Redfish/OpenBMC
  • KCS
  • IPMB
  • PLDM
  • schematics and PCB layout
  • high-density switch/router platforms
  • AI/ML fabric infrastructure