Research Engineer, RL Scaling Science

Anthropic Anthropic · AI Frontier · London, United Kingdom · AI Research & Engineering

Research Engineer focused on scaling Reinforcement Learning (RL) for frontier models. Designs and runs large-scale RL experiments to understand and resolve bottlenecks, builds benchmarks for long-horizon progress, and ships validated findings into production training recipes. Operates at the research/engineering boundary.

What you'd actually do

  1. Design, run, and interpret large-scale RL experiments, reasoning rigorously about what the data does and doesn't show
  2. Investigate how RL improves as horizon, compute, and model size grow
  3. Build and maintain benchmarks for long-horizon RL so progress is measurable and reproducible
  4. Translate validated findings into production training recipes, exercising judgment about when a result is robust enough to ship
  5. Debug complex issues at the seam where research meets infrastructure - failures that only appear at scale

Skills

Required

  • Reinforcement Learning
  • large-scale ML training
  • Python
  • large-scale or distributed ML systems
  • debugging complex issues at the research/systems boundary
  • societal impacts of AI
  • responsible scaling

Nice to have

  • published or shipped work in long-horizon RL or RL fundamentals
  • translating research findings into production training recipes
  • large scale industry impact via RL interventions
  • frontier-scale training runs with long trajectories

What the JD emphasized

  • large-scale experiments
  • long-horizon RL
  • production training recipes
  • frontier-scale training runs

Other signals

  • large-scale experiments
  • frontier models
  • production training recipes
  • long-horizon RL
  • scaling RL