Sr. Gpu/accelerator Hardware Development Engineer, Annapurna Labs

Amazon Amazon · Big Tech · Austin, TX · Hardware Development

This role focuses on the hardware development of AI accelerator compute systems, specifically for AWS Trainium and Project Rainier. The engineer will be responsible for system design, validation, and integration of hardware in data centers, working with custom silicon designed for high-performance AI training. The role involves leading hardware development projects, optimizing hardware for AI workloads, and collaborating with cross-functional teams.

What you'd actually do

  1. We are looking for a Lead Hardware Design Engineer with strong skills in both hardware and software.
  2. In this role, you will be responsible for system design, validation, and integration of hardware in the AWS fleet through its entire life cycle.
  3. You will work cross functionally with AWS monitoring teams, members of the Hardware Design team, and additional teams across AWS to improve quality and reliability of products operating in the fleet.
  4. We are looking for candidates who thrive in a fast-paced start-up like environment and work independently to deliver multiple projects in parallel.
  5. To be successful, you need to be highly motivated and detailed oriented while meeting the highest standards and time to market, cost and quality goals.

Skills

Required

  • BS or MS degree in Electrical or Computer Engineering (EE / CE)
  • Minimum of 5 years of experience with High-Speed system design and validation
  • Experience with Schematic and layout tools.
  • Drive ODM HW development and testing and be part of the Production flow definition team
  • Strong knowledge in electrical engineering fundamentals, power & signal integrity, and analog/digital circuits
  • Able to drive component selection and validation of electrical, mechanical components, cables
  • Experience with hardware development process and system development across full product life cycles
  • Experience using lab equipment such as bench power supplies, high-speed oscilloscopes, logic analyzers, spectrum analyzers, VNA’s, and thermal chambers
  • Experience with supply chain management

Nice to have

  • Lead end-to-end server hardware development lifecycle from Concept, Architecture, Design, Validation and Production
  • Drive PCB board design for server motherboards, accelerator carrier boards, and high-speed interconnect boards.
  • Collaborate with silicon, firmware, and system software teams to enable optimal hardware/software co-design.
  • Improve compute density, power efficiency, and network bandwidth utilization.
  • Drive root cause analysis for hardware issues during validation and production.

What the JD emphasized

  • Next Generation of AI accelerator compute systems
  • AWS Project Rainer
  • AWS Trainium
  • Project Rainier is a massive $11 billion Amazon Web Services (AWS) AI infrastructure initiative
  • over 500,000 custom Trainium2 chips
  • high-performance AI training
  • System design and optimization of hardware in our data centers
  • large scale server deployments
  • full product life cycles
  • end-to-end server hardware development lifecycle