Research Scientist, Reinforcement Learning (llm) and Post-training

AMD AMD · Semiconductors · Santa Clara, CA · Engineering

Research Scientist specializing in reinforcement learning for post-training and interactive learning of large generative models, applied to engineering and hardware-adjacent tasks. Focus on RL algorithms, empirical studies, and collaboration with infra/product teams.

What you'd actually do

  1. Research and develop RL methods for post-training LLMs and code models on structured engineering tasks with verifiable or preference-based feedback
  2. Design reward models, curricula, and off-policy or on-policy training recipes suited to sparse, noisy, or expensive labels from experts and simulators
  3. Characterize failure modes (reward hacking, degenerate policies, instability) and propose mitigations grounded in experiments
  4. Collaborate with RL infra engineers to scale training; define interfaces for rollout generation, logging, and reproducibility
  5. Publish at top venues (e.g. NeurIPS, ICML, ICLR) and contribute internal technical leadership on the RL roadmap

Skills

Required

  • Reinforcement Learning
  • LLM Post-training
  • Generative Models
  • Code Models
  • Tool Use
  • Policy Optimization
  • Reward Modeling
  • Empirical Studies
  • Distributed Training
  • Publication Record

Nice to have

  • Compilers
  • Kernels
  • EDA-style workflows
  • Large-scale codebases

What the JD emphasized

  • publish and ship
  • RL theory
  • practical path from ablation to production-scale training
  • reward misspecification
  • variance reduction
  • evaluation that reflects real constraints
  • non-trivial scale
  • GPUs
  • distributed jobs
  • LLM post-training
  • RLHF/RLAIF
  • policy optimization for language or code agents
  • compilers
  • kernels
  • EDA-style workflows
  • large-scale codebases

Other signals

  • Reinforcement Learning
  • LLM Post-training
  • Generative Models
  • Code Models
  • Tool Use
  • Decision Making