Senior Compiler Engineer, GPU Code Object Rewriting & Tooling

AMD AMD · Semiconductors · San Jose, CA · Engineering

This role focuses on building next-generation infrastructure for predicting, explaining, and improving AMD GPU kernel performance. It involves creating performance-estimation systems using various data sources like ISA, compiler output, and simulator traces. The engineer will model key performance drivers, validate predictions against hardware and simulators, and partner with other teams to optimize hardware, compilers, and kernels. While ML-assisted modeling is mentioned as an exploration area, the core of the role is in GPU performance modeling, compiler analysis, and microarchitecture, not direct AI/ML model development as the primary output.

What you'd actually do

  1. Lead the design of an explainable GPU performance-estimation system for AMDGPU kernels.
  2. Build models from ISA, compiler output, code-object metadata, profiler data, simulator traces, microbenchmarks, and architecture facts.
  3. Model key performance drivers: wave scheduling, occupancy, VGPR/SGPR pressure, matrix pipelines, VALU/SALU issue pressure, VMEM/SMEM traffic, LDS bank conflicts, cache behavior, memory coalescing, waitcnt dependencies, barriers, and latency hiding.
  4. Use current-generation GPUs, hardware counters, synthetic kernels, and simulator traces to calibrate and validate predictions for future architectures.
  5. Produce reports with estimated cycles, lower and upper bounds, uncertainty, bottleneck attribution, missing facts, and source/PC-level explanations.

Skills

Required

  • GPU microarchitecture and execution models (waves/warps, SIMD/SIMT, registers, shared/local memory, caches, memory coalescing, barriers, occupancy, latency hiding, scheduling)
  • Quantitative performance analysis (instruction throughput, dependency chains, memory bandwidth, cache effects, occupancy cliffs, issue bottlenecks, resource contention)
  • Building analytical, trace-driven, simulation-based, or compiler-assisted performance models
  • C++ systems programming
  • Compilers, compiler IRs, machine-level code generation, static/dynamic analysis, target-specific optimization
  • Reading and reasoning about low-level assembly, ISA encodings, compiler output, and profiler traces
  • Performance analysis using profilers, hardware counters, traces, simulators, microbenchmarks, custom instrumentation
  • Technical leadership (setting direction, influencing, mentoring, explaining complex behavior)

Nice to have

  • AMDGPU, GCN, RDNA, CDNA, ROCm, HIP, HSA, AMDGPU LLVM backend
  • GPU performance tuning for HPC, AI, graphics, performance libraries
  • GPU architecture modeling, cycle simulators, trace-driven simulators, analytical performance models, silicon bring-up
  • Hardware-software co-design for new ISA features, memory systems, matrix/tensor units, schedulers, compiler-visible architecture features
  • Matrix/tensor instructions (MFMA, WMMA, tensor cores)
  • ROCm profiling tools, hardware performance counters, thread traces, ROCprof, ROCm Compute Profiler, Nsight Compute
  • Modeling LDS/shared-memory bank conflicts, cache behavior, memory coalescing, atomics, synchronization, barriers, tail effects
  • LLVM, MLIR, GCC, production compiler infrastructure
  • Binary analysis, disassembly, LLVM MC, ELF/code-object metadata, DWARF/source correlation, post-link analysis
  • Machine learning for performance modeling, trace analysis, anomaly detection, autotuning, learned residual correction

What the JD emphasized

  • deep understanding of GPU microarchitecture and execution models
  • Strong quantitative performance intuition
  • Experience building analytical, trace-driven, simulation-based, or compiler-assisted performance models
  • Strong C++ systems programming skills
  • Experience with compilers, compiler IRs, machine-level code generation, static/dynamic analysis, or target-specific optimization
  • Ability to read and reason about low-level assembly, ISA encodings, compiler output, and profiler traces
  • Experience analyzing performance using profilers, hardware counters, traces, simulators, microbenchmarks, or custom instrumentation
  • Technical leadership