What you'd actually do

Own the AI software stack: Establish best practices and drive performance from low-level GPU kernels to large-scale distributed systems. Use modern LLMs and agent-based tooling where it accelerates development and tuning of the ROCm ecosystem.

Accelerate foundation models and agents: Improve training, post-training, and inference for LLMs and autonomous AI workloads so AMD is the default platform for the most demanding use cases.

Co-design hardware and software: Partner on the full lifecycle—from GPU architecture input to software for new accelerators—and engage with the broader AI community to keep AMD at the forefront.

Skills

Required

Expert-level modern C++
design of large, performance-critical systems
Strong grasp of GPU architecture, memory hierarchy, and kernel optimization (HIP/CUDA)
Hands-on delivery on large-scale C++/HIP/CUDA codebases
Comfort diagnosing bottlenecks with profilers in multi-GPU, distributed settings
Deep understanding of transformers, attention, and the full model lifecycle
Hands-on work in alignment and post-training—for example, SFT, RLHF, and GRPO
Awareness of current LLM trends, including MoE, quantization, speculative decoding, and agentic systems
Experience optimizing post-training and inference pipelines at scale

Nice to have

Substantial professional experience in software development within performance-critical environments
Extensive HIP/CUDA experience optimizing deep learning and OSS LLM inference/training kernels and operators
Strong technical ownership and a track record of shipping complex systems
Clear communication and influence across teams
Deep familiarity with the AMD ROCm/HIP ecosystem
Working knowledge of RTL design and Verilog/SystemVerilog for hardware–software co-design
Master's degree
PhD
Publications in AI/ML, GPU computing, or systems optimization

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. **Together, we advance your career. **

WHY AMD

At AMD, we build products that accelerate next-generation computing—from AI and data centers to PCs, gaming, and embedded systems. Progress here comes from bold ideas, strong execution, and people who care about hard problems.

This role sits at the center of that mission: making AMD the platform of choice for the most demanding AI workloads by improving how models train, align, and run on our GPUs.

THE OPPORTUNITY

We're looking for a senior software engineer who combines deep systems performance work with modern AI—someone who can shape software from GPU kernels through distributed training and inference.

You'll join a core team of specialists working on the latest AMD hardware and software. Your work will directly influence the ROCm ecosystem and how foundation models and agentic systems perform on AMD GPUs.

The challenge: Help train and run AI systems that make AI itself more efficient on GPUs—tuning stacks, kernels, and workflows in ways that can materially shift what's possible on our hardware.

This is a high-impact, hands-on role. You'll own hard technical problems, influence direction across teams, and mentor others as we scale AMD's AI software strategy.

WHAT YOU'LL DO

Own the AI software stack: Establish best practices and drive performance from low-level GPU kernels to large-scale distributed systems. Use modern LLMs and agent-based tooling where it accelerates development and tuning of the ROCm ecosystem.
Accelerate foundation models and agents: Improve training, post-training, and inference for LLMs and autonomous AI workloads so AMD is the default platform for the most demanding use cases.
Co-design hardware and software: Partner on the full lifecycle—from GPU architecture input to software for new accelerators—and engage with the broader AI community to keep AMD at the forefront.

WHAT WE'RE LOOKING FOR

We need someone who can go deep in the areas below and collaborate effectively.

SYSTEMS & GPU PERFORMANCE (KERNEL ENGINEERING)

Expert-level modern C++ and design of large, performance-critical systems.
Strong grasp of GPU architecture, memory hierarchy, and kernel optimization (HIP/CUDA).
Hands-on delivery on large-scale C++/HIP/CUDA codebases, such as ROCm (rocBLAS, hipDNN, Composable Kernel, AITemplate), the CUDA ecosystem (cuBLAS, cuDNN, CUTLASS, Thrust, CUB, NCCL), and ML framework cores such as PyTorch, TensorFlow, or JAX (C++/HIP/CUDA paths).
Comfort diagnosing bottlenecks with profilers (for example, ROCm Profiler and Nsight) in multi-GPU, distributed settings.

AI POST-TRAINING & LLM SYSTEMS

Deep understanding of transformers, attention, and the full model lifecycle.
Hands-on work in alignment and post-training—for example, SFT, RLHF, and GRPO.
Awareness of current LLM trends, including MoE, quantization, speculative decoding, and agentic systems.
Experience optimizing post-training and inference pipelines at scale.

PREFERRED BACKGROUND

Substantial professional experience in software development within performance-critical environments.
Extensive HIP/CUDA experience optimizing deep learning and OSS LLM inference/training kernels and operators.
Strong technical ownership and a track record of shipping complex systems.
Clear communication and influence across teams.
Plus: Deep familiarity with the AMD ROCm/HIP ecosystem.
Plus: Working knowledge of RTL design and Verilog/SystemVerilog for hardware–software co-design.

EDUCATION

Bachelor's in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience.
Master's preferred; PhD a plus.
Publications in AI/ML, GPU computing, or systems optimization are valued.

#LI-AG2

#LI-HYBRID

_Benefits offered are described: _AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

_ _

This posting is for an existing vacancy.