Sr Hardware Development Engineer, High Performance AI & ML Servers

Amazon Amazon · Big Tech · Austin, TX · Hardware Development

This role focuses on designing, delivering, and operating next-generation server infrastructure for AI training and inference workloads within AWS. It involves collaborating with various engineering teams and suppliers to ensure high performance, efficiency, and scalability of the hardware systems that power AI/ML and HPC in the cloud.

What you'd actually do

  1. interfacing with our internal and external customers to understand project requirements and facilitate system development on top of your server design.
  2. solving operational challenges to our existing fleet with the goal of improving the current customer experience as well as developing improved systems for future designs.
  3. work directly with vendors and ODM/JDM design teams to develop and manufacture your product at scale.

Skills

Required

  • Bachelor's degree or above in electrical engineering, computer engineering, or equivalent
  • Experience with server, storage, networking, or large-scale distributed systems
  • Experience in developing functional specifications, design verification plans and functional test procedures
  • Experience troubleshooting issues and root cause analysis
  • Experience leading ODMs and other suppliers in the product development and manufacturing processes

Nice to have

  • Master's degree in Electrical Engineering, Computer Engineering, or a related technical field
  • Experience working with interdisciplinary teams to execute product design from concept to production
  • Expertise in product development disciplines such as, thermal, mechanical, power, FW/SW, reliability, and sustaining
  • Experience deploying and operating hardware and applications across large data centers.

What the JD emphasized

  • design, deliver, and operate next-generation infrastructure that powers breakthrough innovation in AI/ML and HPC workloads
  • pushing the limits of performance, efficiency, and scalability in the cloud
  • build the systems that define what’s next for AWS — and for the entire AI industry