Hbm Validation Engineer, Annapurna Labs

Amazon Amazon · Big Tech · Austin, TX · Hardware Development

Seeking an HBM/DDRx validation expert to validate AWS next generation ML Chips, Cards and server integration. The role involves executing HBM across Trainium platforms to improve HBM characteristics for AI servers. Responsibilities include working with vendors, understanding settings, writing/modifying tests, debugging, and collecting data. The role also involves collaborating with architects, design teams, and software engineers, supporting ongoing debug and operations of ML chips, diving deep into IP integration, packaging, silicon bring up, characterization, and validation of HBM subsystems, and developing scripts for daily tasks. The role requires a BS in Electrical Engineering, Computer Engineering, Systems Engineering, Computer Science or related field, with 5+ years of experience in Silicon development, including 3+ years in SOC/IO/Subsystems. Understanding of DDR/HBM at PHY and controller level, and knowledge of DDR/HBM training, timing parameters, and/or controller features are essential. Experience with physical design support, scripting (lua, bash, python), cross-functional triage, system-level debug, and working with third parties is also required.

What you'd actually do

  1. Collaborate with architects, design teams, and software engineers on our next generation ML chips
  2. Support on-going debug and operations of previous ML chips within manufacturing and the data center
  3. Dive deep into IP integration, packaging, silicon bring up, characterization, and validation of our HBM subsystems
  4. Independently develop the scripts you need to execute and collaborate with software engineers as your needs scale

Skills

Required

  • BS in Electrical Engineering, Computer Engineering, Systems Engineering, Computer Science or related field
  • 5+ years of experience in Silicon development
  • 3+ years in SOC/IO/Subsystems
  • Good understanding of DDR/HBM at the PHY and controller level
  • Good knowledge of DDR/HBM training, timing parameters and/or controller features
  • Support the physical design team with IP integration, silicon design, 2.5D packaging, clocking and timing constraints
  • Ability to create scripts (lua, bash, python, etc.) to accomplish functional day to day tasks
  • Drive cross-functional triage effort on functional and performance issues
  • Perform system-level debug and root-cause analysis through bring-up, characterization, validation and production phase
  • Experience Working with 3rd party

Nice to have

  • experience with AWS next generation ML Chips, Cards and server integration
  • experience with Trainium platforms
  • experience with memory team
  • experience with scripting knowledge and AI tooling

What the JD emphasized

  • HBM validation expert
  • next generation ML Chips
  • Trainium AI servers
  • HBM subsystems