Post-silicon Systems Validation Engineer, Annapurna Labs

Amazon Amazon · Big Tech · Austin, TX · Software Development

This role focuses on validating next-generation machine learning accelerators for AWS cloud infrastructure, covering the entire vertical stack from silicon to system. The engineer will be responsible for developing validation strategies, executing test plans, conducting hands-on bring-up and debug, and validating ML accelerator performance using real-world workloads. The role requires collaboration with various engineering teams and a strong understanding of computer architecture and ML fundamentals.

What you'd actually do

  1. Developing comprehensive validation strategies and detailed test plans covering functional, performance, power, and stress testing from silicon bring-up to product release
  2. Executing complex test plans from RTL simulation and emulation environments through physical silicon validation
  3. Conducting hands-on silicon bring-up and debug in the lab using oscilloscopes, logic analyzers, and protocol analyzers
  4. Validating ML accelerator performance, accuracy, and reliability using real-world neural network workloads
  5. Building test infrastructure, CI/CD, and automated regression frameworks to enable efficient validation at scale

Skills

Required

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution, or experience working with PyTorch or JAX software
  • Bachelor's degree in computer science, engineering, mathematics or equivalent, or experience in Java, C++, Python, or a related language
  • 3+ years of experience with hardware performance counters and profiling tools for analyzing and optimizing system and application performance
  • Strong understanding of computer architecture fundamentals including memory hierarchies (caches, DRAM, HBM), compute pipelines, and interconnect topologies
  • Experience applying statistical methods, regression analysis, and data visualization techniques to interpret performance data and drive optimization decisions
  • Strong programming skills (Python, Lua, C/C++, Rust, Go, etc)
  • A solid understanding of computer architecture
  • Validation experience in any of these areas: PCIe, HBM, GPUs, neural networks, ML HW architecture, and/or CI/CD
  • Familiarity with the validation lifecycle from RTL simulation (SystemVerilog/UVM, VCS, Questa, Xcelium) and emulation (Palladium, Zebu, Veloce) through silicon failure analysis and debug

Nice to have

  • Experience with AWS services, cloud infrastructure, firmware development (BIOS, BMC, drivers)
  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Bachelor's degree in computer science or equivalent
  • Experience with Machine Learning Hardware/Software Architecture
  • Experience with CI/CD
  • Experience with EDA Simulations or Emulation

What the JD emphasized

  • next-generation machine learning accelerators
  • AI training and inference
  • ML workloads
  • ML accelerator performance, accuracy, and reliability

Other signals

  • validating next-generation machine learning accelerators
  • power AWS's cloud computing infrastructure
  • AI training and inference
  • ML workloads
  • ML accelerator performance, accuracy, and reliability