Graphics Ip Ras Engineer

AMD AMD · Semiconductors · Austin, TX · Engineering

AMD is seeking a mid-level Graphics IP RAS Engineer to work on system reliability and RAS features for next-generation AMD products, including AI accelerators. The role involves designing, analyzing, and improving RAS features, working with architects, and analyzing reliability metrics. Experience with VLSI designs, digital logic, and computer architecture is required. Exposure to AI/ML accelerator architectures and reliability challenges in AI systems is preferred.

What you'd actually do

  1. Contribute to the definition and development of RAS features and capabilities for next‑generation GFXIP deployments.
  2. Assist in writing and reviewing architectural specifications related to RAS features.
  3. Work with senior RAS and GFXIP architects to learn and apply reliability concepts, including error detection, containment, recovery, and reporting.
  4. Help analyze and understand reliability metrics such as FIT, DPPM, MTBF, and MTBR, and how they trade off against performance, power, area (PPA), and security.
  5. Support hardware design teams in evaluating RAS protection and coverage, and in identifying gaps or improvement opportunities.

Skills

Required

  • digital logic
  • computer architecture
  • hardware design fundamentals
  • read and reason about specifications, RTL, and architectural documentation
  • communication skills
  • work in cross-functional engineering teams
  • VLSI-based designs

Nice to have

  • reliability, fault tolerance, or system‑level robustness
  • Parity and ECC
  • Watchdog timers
  • Heartbeat monitors
  • Deferred or recoverable error handling
  • GPU architecture
  • RTL design
  • failure rates and figures of merit
  • AI/ML accelerator architectures
  • AI compute environments
  • reliability challenges in large-scale AI systems

What the JD emphasized

  • strong mid‑level engineer
  • system reliability
  • RAS architecture
  • RAS features
  • hardware reliability
  • RAS experience
  • system robustness and reliability
  • RAS expert
  • RAS features
  • RAS features
  • RAS concepts
  • reliability metrics
  • RAS protection
  • RAS requirements
  • hardware or system features
  • computer architecture
  • reliability, fault tolerance, or system‑level robustness
  • RAS-related concepts
  • GPU architecture
  • AI/ML accelerator architectures
  • AI compute environments
  • reliability challenges in large-scale AI systems