Soc / Debug Lead Engineer

AMD AMD · Semiconductors · Austin, TX · Engineering

This role is for a SOC/Debug Lead Engineer focused on AMD's Server and AI Platforms. The primary responsibilities involve leading system validation plans, collaborating with various engineering teams for root cause analysis of platform and SoC issues, and improving debug methodologies. The role requires a strong understanding of server platform components, CPU architectures, and experience with debugging tools and scripting. While the role works with AI platforms, the core function is hardware validation and debug, not AI model development or research.

What you'd actually do

  1. Lead system validation plans for EPYC and AI platforms to ensure alignment with program milestone criteria, leveraging strong expertise in domains such as x86 architecture, power management, high‑speed data‑center I/O (PCIe, CXL, etc.), RAS features to drive test execution and resolve issues efficiently.
  2. Collaborate with partner organizations to provide root cause analysis for platform issues in a Data center environment. The debug role is expected to provide root cause analysis for platform level, SoC logical, performance and BIOS/firmware issues
  3. Improve debug capabilities and methodology over time by identifying common challenges or impediments to efficient debug and working with partner organizations like design, Firmware and software teams to drive innovation in silicon architecture, design, tools and methods.
  4. Manage and track technical issues, risks, and priorities effectively with the business unit and SW Debug tools teams. Manage customer and executive communications, including program status, risks and opportunities.
  5. Maintain strong communication skills, both verbal and written, to convey summary findings and recommendations to senior management.

Skills

Required

  • BS or MS degree in Electrical Engineering or related major, with 12+ years of applicable experience
  • Strong understanding of Server platform components, x86 or other complex CPU architectures.
  • Experience with handling and taking captures using Oscilloscopes, protocol analyzers, and JTAG based Debug Tools.
  • Proficiency in C, Python, and shell scripting for low-level development and debug
  • Excellent organizational skills and the ability to prioritize multiple workstreams and meet tight deadlines.
  • Strong networking and relationship-building skills, with the ability to drive effective decision-making across various functions and levels within the organization.

Nice to have

  • Proficiency with Linux and/or Microsoft Operating Systems
  • Preferred domain expertise in one of areas such as IO interfaces - PCIe, CXL, RAS, Power management
  • Understanding of BMC firmware and features, including IPMI, Redfish, sensor monitoring, power control, and remote management
  • Demonstrable experience in designing experiments to solve problems and strong analytical skills.
  • Prior experience with computer system design and/or validation, testing tools, and environments.
  • Knowledge of pre-silicon environments (Verification, Emulation, Virtual Bring-Up)