Principal Soc Performance Architect-microbenchmarks

AMD AMD · Semiconductors · Austin, TX · Engineering

This Principal SoC Performance Architect role at AMD focuses on analyzing, characterizing, and optimizing the performance of next-generation Data Center GPU (DCGPU) platforms for AI training, inference, and HPC workloads. The role involves developing and maintaining microbenchmarks and system-level workloads across pre-silicon and post-silicon environments, identifying bottlenecks across the hardware/software stack, and providing actionable insights to improve performance. It requires deep understanding of GPU architecture, parallel computing, and performance profiling tools.

What you'd actually do

  1. Analyze and optimize performance of DCGPU systems across AI training, inference, and HPC workloads
  2. Design and develop targeted microbenchmarks to characterize GPU subsystems (compute, memory, interconnect, collectives)
  3. Enable performance validation in pre-silicon environments (simulation/emulation/models)
  4. Work across the entire software stack: compiler, runtime, libraries, drivers, and firmware
  5. Develop and enhance performance measurement, profiling, and analysis tools

Skills

Required

  • GPU architecture
  • parallel computing
  • memory hierarchies
  • microbenchmark development
  • system-level workload analysis
  • performance profiling tools
  • AI/HPC workload analysis
  • hardware/software co-design
  • performance optimization
  • pre-silicon performance workflows
  • post-silicon performance workflows
  • C++
  • Python
  • GPU programming models (HIP, CUDA, OpenCL)
  • analytical skills
  • debugging skills
  • data-driven mindset
  • full software stack experience

Nice to have

  • performance modeling

What the JD emphasized

  • AI training
  • AI inference
  • HPC workloads
  • performance analysis
  • optimization
  • microbenchmarks
  • system-level workloads
  • GPU subsystems
  • pre-silicon
  • post-silicon
  • full software stack
  • performance measurement
  • profiling
  • analysis tools

Other signals

  • performance analysis
  • optimization
  • microbenchmarks
  • AI training
  • AI inference
  • HPC workloads
  • GPU architecture
  • full-stack performance