What you'd actually do

Research and develop RL methods for post-training LLMs and code models on structured engineering tasks with verifiable or preference-based feedback

Design reward models, curricula, and off-policy or on-policy training recipes suited to sparse, noisy, or expensive labels from experts and simulators

Characterize failure modes (reward hacking, degenerate policies, instability) and propose mitigations grounded in experiments

Collaborate with RL infra engineers to scale training; define interfaces for rollout generation, logging, and reproducibility

Publish at top venues (e.g. NeurIPS, ICML, ICLR) and contribute internal technical leadership on the RL roadmap

What the JD emphasized

publish and ship

RL theory

practical path from ablation to production-scale training

reward misspecification

variance reduction

evaluation that reflects real constraints

non-trivial scale

GPUs

distributed jobs

LLM post-training

RLHF/RLAIF

policy optimization for language or code agents

compilers

kernels

EDA-style workflows

large-scale codebases

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. **Together, we advance your career. **

THE ROLE:

We are hiring a Research Scientist, Reinforcement Learning (LLM) and Post-Training, specializing in reinforcement learning to advance post-training and interactive learning for large generative models applied to demanding engineering and hardware-adjacent tasks (code, optimization, tool use, and long-horizon decision making). You will invent and analyze RL algorithms—policy optimization, preference-based methods, exploration, credit assignment, and reward modeling—run rigorous empirical studies, and partner with infra and product teams to land methods that improve measurable task success without sacrificing stability or safety.

THE PERSON:

You publish and ship. You are fluent in both RL theory and the practical path from ablation to production-scale training. You care about reward misspecification, variance reduction, and evaluation that reflects real constraints—not only toy environments.

KEY RESPONSIBILITIES:

Research and develop RL methods for post-training LLMs and code models on structured engineering tasks with verifiable or preference-based feedback
Design reward models, curricula, and off-policy or on-policy training recipes suited to sparse, noisy, or expensive labels from experts and simulators
Characterize failure modes (reward hacking, degenerate policies, instability) and propose mitigations grounded in experiments
Collaborate with RL infra engineers to scale training; define interfaces for rollout generation, logging, and reproducibility
Publish at top venues (e.g. NeurIPS, ICML, ICLR) and contribute internal technical leadership on the RL roadmap

PREFERRED EXPERIENCE:

Strong publication record in reinforcement learning or closely related machine learning areas.
Hands-on experience training RL or preference-optimized models at non-trivial scale (GPUs, distributed jobs)
Experience with LLM post-training, RLHF/RLAIF, or policy optimization for language or code agents
Familiarity with compilers, kernels, EDA-style workflows, or large-scale codebases is a plus

ACADEMIC CREDENTIALS:

PhD in Computer Science, Machine Learning, or related field strongly preferred.

#LI-BM1

#LI-Hybrid

_Benefits offered are described: _AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

_ _

This posting is for an existing vacancy.

WHAT YOU DO AT AMD CHANGES EVERYTHING

THE ROLE:

THE PERSON:

KEY RESPONSIBILITIES:

Research and develop RL methods for post-training LLMs and code models on structured engineering tasks with verifiable or preference-based feedback
Design reward models, curricula, and off-policy or on-policy training recipes suited to sparse, noisy, or expensive labels from experts and simulators
Characterize failure modes (reward hacking, degenerate policies, instability) and propose mitigations grounded in experiments
Collaborate with RL infra engineers to scale training; define interfaces for rollout generation, logging, and reproducibility
Publish at top venues (e.g. NeurIPS, ICML, ICLR) and contribute internal technical leadership on the RL roadmap

PREFERRED EXPERIENCE:

Strong publication record in reinforcement learning or closely related machine learning areas.
Hands-on experience training RL or preference-optimized models at non-trivial scale (GPUs, distributed jobs)
Experience with LLM post-training, RLHF/RLAIF, or policy optimization for language or code agents
Familiarity with compilers, kernels, EDA-style workflows, or large-scale codebases is a plus

ACADEMIC CREDENTIALS:

PhD in Computer Science, Machine Learning, or related field strongly preferred.

#LI-BM1

#LI-Hybrid

_Benefits offered are described: _AMD benefits at a glance.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

_ _

This posting is for an existing vacancy.

Research Scientist, Reinforcement Learning (llm) and Post-training

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals