Sr. Post-silicon Systems Software Validation Engineer, Annapurna Labs

Amazon Amazon · Big Tech · Austin, TX · Software Development

This role focuses on validating next-generation machine learning accelerators for AWS, covering the full vertical stack from silicon to system. The engineer will be responsible for developing validation strategies, executing test plans, debugging hardware and software, and collaborating with cross-functional teams to ensure the quality and performance of AI/ML accelerators used in AWS data centers.

What you'd actually do

  1. lead and own critical validation aspects across the entire product development lifecycle—from early design validation through emulation, silicon bring-up, post-silicon validation, and ongoing support of production systems deployed in AWS data centers.
  2. collaborate deeply with architecture, RTL design, design verification, firmware, and software teams to ensure our next-generation AI/ML accelerators meet the highest standards of quality and performance.
  3. Developing comprehensive validation strategies and leads new methodologies to improve validation coverage and time to root cause.
  4. Role models detailed test plans covering functional, performance, power, and stress testing from silicon bring-up to product release
  5. Validating ML accelerator performance, accuracy, and reliability using real-world neural network workloads

Skills

Required

  • Python
  • Lua
  • C/C++
  • Rust
  • Go
  • computer architecture
  • chip/system validation methodologies
  • cloud infrastructure
  • CI/CD
  • Firmware testing and/or development (BIOS, BMC, drivers)
  • PCIe
  • HBM
  • GPUs
  • neural networks
  • ML HW architecture
  • RTL simulation (SystemVerilog/UVM, VCS, Questa, Xcelium)
  • emulation (Palladium, Zebu, Veloce)
  • silicon failure analysis and debug
  • Linux environments
  • Git
  • server hardware
  • debug tools

Nice to have

  • Machine Learning Hardware/Software Architecture
  • CI/CD
  • EDA Simulations or Emulation

What the JD emphasized

  • next-generation machine learning accelerators
  • AI training and inference
  • ML workloads

Other signals

  • validating next-generation machine learning accelerators
  • power AWS's cloud computing infrastructure
  • accelerating the development of custom silicon
  • AI training and inference