What you'd actually do

Lead the design of an explainable GPU performance-estimation system for AMDGPU kernels.

Build models from ISA, compiler output, code-object metadata, profiler data, simulator traces, microbenchmarks, and architecture facts.

Model key performance drivers: wave scheduling, occupancy, VGPR/SGPR pressure, matrix pipelines, VALU/SALU issue pressure, VMEM/SMEM traffic, LDS bank conflicts, cache behavior, memory coalescing, waitcnt dependencies, barriers, and latency hiding.

Use current-generation GPUs, hardware counters, synthetic kernels, and simulator traces to calibrate and validate predictions for future architectures.

Produce reports with estimated cycles, lower and upper bounds, uncertainty, bottleneck attribution, missing facts, and source/PC-level explanations.

Skills

Required

GPU microarchitecture and execution models (waves/warps, SIMD/SIMT, registers, shared/local memory, caches, memory coalescing, barriers, occupancy, latency hiding, scheduling)
Quantitative performance analysis (instruction throughput, dependency chains, memory bandwidth, cache effects, occupancy cliffs, issue bottlenecks, resource contention)
Building analytical, trace-driven, simulation-based, or compiler-assisted performance models
C++ systems programming
Compilers, compiler IRs, machine-level code generation, static/dynamic analysis, target-specific optimization
Reading and reasoning about low-level assembly, ISA encodings, compiler output, and profiler traces
Performance analysis using profilers, hardware counters, traces, simulators, microbenchmarks, custom instrumentation
Technical leadership (setting direction, influencing, mentoring, explaining complex behavior)

Nice to have

AMDGPU, GCN, RDNA, CDNA, ROCm, HIP, HSA, AMDGPU LLVM backend
GPU performance tuning for HPC, AI, graphics, performance libraries
GPU architecture modeling, cycle simulators, trace-driven simulators, analytical performance models, silicon bring-up
Hardware-software co-design for new ISA features, memory systems, matrix/tensor units, schedulers, compiler-visible architecture features
Matrix/tensor instructions (MFMA, WMMA, tensor cores)
ROCm profiling tools, hardware performance counters, thread traces, ROCprof, ROCm Compute Profiler, Nsight Compute
Modeling LDS/shared-memory bank conflicts, cache behavior, memory coalescing, atomics, synchronization, barriers, tail effects
LLVM, MLIR, GCC, production compiler infrastructure
Binary analysis, disassembly, LLVM MC, ELF/code-object metadata, DWARF/source correlation, post-link analysis
Machine learning for performance modeling, trace analysis, anomaly detection, autotuning, learned residual correction

What the JD emphasized

deep understanding of GPU microarchitecture and execution models

Strong quantitative performance intuition

Experience building analytical, trace-driven, simulation-based, or compiler-assisted performance models

Strong C++ systems programming skills

Experience with compilers, compiler IRs, machine-level code generation, static/dynamic analysis, or target-specific optimization

Ability to read and reason about low-level assembly, ISA encodings, compiler output, and profiler traces

Experience analyzing performance using profilers, hardware counters, traces, simulators, microbenchmarks, or custom instrumentation

Technical leadership

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. **Together, we advance your career. **

The Role

We are building next-generation infrastructure to predict, explain, and improve AMD GPU kernel performance across current and future architectures, including cases where final hardware is not yet available.

This role is for a deeply technical performance-modeling leader who understands GPU hardware and software end to end: ISA, compiler code generation, memory hierarchy, schedulers, matrix units, occupancy, profiling, simulation, and kernel behavior. You will help define how we model new GPU architectures, validate those models against hardware and simulators, and teach teams across compiler, runtime, architecture, and performance libraries how to reason about GPU performance.

The work sits at the intersection of GPU performance modeling, compiler analysis, microarchitecture, profiler/simulator validation, and architecture-aware optimization. Hardware-software co-design experience is highly valuable, especially for influencing future ISA features, compiler-visible architecture choices, memory-system behavior, and matrix/tensor pipelines.

What You Will Do

Lead the design of an explainable GPU performance-estimation system for AMDGPU kernels.
Build models from ISA, compiler output, code-object metadata, profiler data, simulator traces, microbenchmarks, and architecture facts.
Model key performance drivers: wave scheduling, occupancy, VGPR/SGPR pressure, matrix pipelines, VALU/SALU issue pressure, VMEM/SMEM traffic, LDS bank conflicts, cache behavior, memory coalescing, waitcnt dependencies, barriers, and latency hiding.
Use current-generation GPUs, hardware counters, synthetic kernels, and simulator traces to calibrate and validate predictions for future architectures.
Produce reports with estimated cycles, lower and upper bounds, uncertainty, bottleneck attribution, missing facts, and source/PC-level explanations.
Partner with GPU architecture, compiler, runtime, profiler, simulator, and performance-library teams to turn modeling results into better hardware, better compilers, and faster kernels.
Mentor engineers, write design docs, lead technical reviews, and educate teams on GPU architecture and performance behavior.
Explore ML-assisted modeling where it improves calibration, residual prediction, anomaly detection, microbenchmark selection, autotuning, or trace analysis while keeping the core model explainable and hardware-grounded.

What We Are Looking For

Deep understanding of GPU microarchitecture and execution models, including waves/warps, SIMD/SIMT execution, registers, shared/local memory, caches, memory coalescing, barriers, occupancy, latency hiding, and scheduling.
Strong quantitative performance intuition: instruction throughput, dependency chains, memory bandwidth, cache effects, occupancy cliffs, issue bottlenecks, and resource contention.
Experience building analytical, trace-driven, simulation-based, or compiler-assisted performance models.
Strong C++ systems programming skills and experience building production-quality low-level tools.
Experience with compilers, compiler IRs, machine-level code generation, static/dynamic analysis, or target-specific optimization.
Ability to read and reason about low-level assembly, ISA encodings, compiler output, and profiler traces.
Experience analyzing performance using profilers, hardware counters, traces, simulators, microbenchmarks, or custom instrumentation.
Technical leadership: ability to set direction, influence across teams, mentor others, and explain complex hardware/software behavior clearly.
Strong validation mindset: you care about evidence, counterexamples, error bars, and avoiding misleading point estimates.

Strong Plus

AMDGPU, GCN, RDNA, CDNA, ROCm, HIP, HSA, or the AMDGPU LLVM backend.
GPU performance tuning for HPC, AI, graphics, or performance libraries.
GPU architecture modeling, cycle simulators, trace-driven simulators, analytical performance models, or silicon bring-up.
Hardware-software co-design for new ISA features, memory systems, matrix/tensor units, schedulers, or compiler-visible architecture features.
Matrix/tensor instructions such as MFMA, WMMA, tensor cores, or other specialized math pipelines.
ROCm profiling tools, hardware performance counters, thread traces, ROCprof, ROCm Compute Profiler, Nsight Compute, or similar tooling.
Modeling LDS/shared-memory bank conflicts, cache behavior, memory coalescing, atomics, synchronization, barriers, and tail effects.
LLVM, MLIR, GCC, or other production compiler infrastructure.
Binary analysis, disassembly, LLVM MC, ELF/code-object metadata, DWARF/source correlation, or post-link analysis.
Machine learning for performance modeling, trace analysis, anomaly detection, autotuning, or learned residual correction.

Preferred Education

Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related field, or equivalent industry experience.

#LI-G11

#LI-HYBRID

This role is not eligible for visa sponsorship.

_Benefits offered are described: _AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

_ _

This posting is for an existing vacancy.