Platform Soc Debug Lead Engineer

AMD AMD · Semiconductors · Austin, TX · Engineering

This role focuses on leading platform and SoC debug for EPYC/AI server platforms, ensuring quality and milestone achievement. It involves deep collaboration with silicon, platform, firmware, and software teams to resolve critical issues and improve debug methodologies. The engineer will work on high-speed data-center I/O, RAS features, and provide root cause analysis for various platform and SoC issues.

What you'd actually do

  1. Lead debugging efforts for enabling AI/Server SoC platforms in domains such as high‑speed data‑center I/O (PCIe, CXL, etc.), RAS features to resolve issues efficiently that are seen from the program execution
  2. Collaborate with partner organizations to provide root cause analysis for platform issues in a Data center environment. The debug role is expected to provide root cause analysis for platform level, SoC logical, performance and BIOS/firmware issues
  3. Improve debug capabilities and methodology over time by identifying common challenges or impediments to efficient debug and working with partner organizations like design, Firmware and software teams to drive innovation in silicon architecture, design, tools and methods.
  4. Manage and track technical issues, risks, and priorities effectively with the business unit and SW Debug tools teams. Manage customer and executive communications, including program status, risks and opportunities.
  5. Maintain strong communication skills, both verbal and written, to convey summary findings and recommendations to senior management.

Skills

Required

  • Strong understanding of Server platform components, x86 or other complex CPU architectures.
  • Deeper domain expertise in areas such as IO interfaces - PCIe, CXL, RAS, Power management to drive comprehensive system level test-plan execution
  • Prior experience with computer system design and/or validation, testing tools, and environments.
  • Experience with handling and taking captures using Oscilloscopes, protocol analyzers, and JTAG based Debug Tools.
  • Proficiency in C, Python, and shell scripting for low-level development and debug
  • Excellent organizational skills and the ability to prioritize multiple workstreams and meet tight deadlines.
  • Strong networking and relationship-building skills, with the ability to drive effective decision-making across various functions and levels within the organization.
  • BS or MS degree in Electrical Engineering or related major, with 12+ years of applicable experience.

Nice to have

  • Proficiency with Linux and/or Microsoft Operating Systems is a plus.
  • Understanding of BMC firmware and features, including IPMI, Redfish, sensor monitoring, power control, and remote management is a plus
  • Knowledge of pre-silicon environments (Verification, Emulation, Virtual Bring-Up) is a plus.

What the JD emphasized

  • AI Platforms
  • AI/Server SoC platforms
  • AI