GPU Performance Modeling & Optimization Engineer

AMD AMD · Semiconductors · Bangalore, India · Engineering

Seeking an experienced GPU Performance Modeling and Optimization Engineer to focus on pre-silicon performance modeling, feature exploration, and workload optimization, as well as post-silicon characterization, hardware-software correlation, and performance debug for upcoming SoCs/GPUs. The role involves optimizing for both traditional graphics and cutting-edge compute/AI workloads, including Transformer-based models and LLMs.

What you'd actually do

  1. Simulator: Leverage and maintain highly accurate, modular cycle-accurate or cycle-approximate simulators for key GPU subsystems (e.g., Shader Engines, Cache Hierarchies, Memory Subsystems, and Interconnects).
  2. Microarchitectural Exploration: Define and execute rigorous simulation experiments to evaluate proposed GPU configurations, scaling limits, and trade-offs. Provide data-driven recommendations backed by thorough sensitivity analyses.
  3. Workload Characterization: Trace, analyze, and profile complex workloads to extract structural execution footprints. Translate these insights into microarchitectural bottlenecks and establish bounding box for performance for various workloads.
  4. Compute & LLM Scaling Optimization: Profile and optimize performance for advanced generative AI and LLM topologies. Identify bottlenecks across the compute engine, local memory hierarchy (L1/L2), and SoC fabrics.
  5. Pre-to-Post Correlation: Lead efforts to execute workloads on early silicon, capture performance telemetry, and systematically correlate results back to pre-silicon performance models to improve simulator fidelity.

Skills

Required

  • GPU architecture
  • GPU execution pipelines
  • SIMD/SIMT models
  • cache hierarchy management
  • memory technologies
  • high-bandwidth interconnects
  • C++
  • Python
  • performance modeling
  • hardware simulators
  • workload profiling
  • post-silicon debugging
  • Linux
  • silicon performance engineering
  • microarchitecture design

Nice to have

  • ray tracing
  • rasterization
  • Transformer-based models
  • Large Language Models (LLMs)
  • Vision models
  • generative AI
  • PyTorch
  • Triton
  • PCIe
  • custom fabrics

What the JD emphasized

  • GPU performance modeling
  • LLM optimization
  • performance debug
  • hardware-software correlation
  • pre-silicon performance modeling
  • post-silicon characterization

Other signals

  • GPU performance modeling for AI workloads
  • LLM optimization
  • Pre-silicon and post-silicon characterization
  • Hardware-software co-design