Research Engineer, Code RL (reinforcement Learning)

Anthropic Anthropic · AI Frontier · San Francisco, CA · AI Research & Engineering

Research Engineer focused on Reinforcement Learning for code generation, aiming to improve models' ability to write, edit, test, debug, and ship software. This role involves designing RL environments, building reward signals, running training experiments, and improving pipeline efficiency, blending research with engineering implementation.

What you'd actually do

  1. Design RL environments and coding tasks
  2. Build the reward signals and verifiers that capture what "good code" means
  3. Run training experiments on frontier models
  4. Diagnose why a model does (or doesn't) get better at a class of software-engineering work
  5. Improve the speed and reliability of the pipelines that make all of that iterate fast

Skills

Required

  • strong software-engineering skills
  • deep Python expertise
  • async/concurrent programming
  • owning systems end to end
  • debugging across the stack
  • balancing research exploration with engineering implementation
  • shaping experimental design
  • interpreting results
  • code quality
  • testing
  • performance

Nice to have

  • reinforcement learning
  • RLHF
  • post-training
  • LLM finetuning
  • coding agents
  • code-execution sandboxes
  • eval harnesses
  • verifiers
  • developer tooling
  • program analysis
  • testing
  • verification
  • compilers
  • formal methods
  • PyTorch
  • large-scale distributed training
  • performance profiling
  • optimization of ML systems
  • CUDA / GPU or TPU kernel experience
  • accelerator-performance intuition

What the JD emphasized

  • strong software-engineering skills
  • deep Python expertise
  • owning systems end to end
  • debugging across the stack
  • balance research exploration with engineering implementation
  • shaping experimental design
  • interpreting results
  • code quality
  • testing
  • performance
  • reinforcement learning
  • RLHF
  • post-training
  • LLM finetuning
  • coding agents
  • code-execution sandboxes
  • eval harnesses
  • verifiers
  • developer tooling
  • program analysis
  • testing
  • verification
  • compilers
  • formal methods
  • PyTorch
  • large-scale distributed training
  • performance profiling
  • optimization of ML systems
  • CUDA / GPU or TPU kernel experience
  • accelerator-performance intuition

Other signals

  • developing systems that enable models to use computers effectively
  • advancing code generation through reinforcement learning
  • pioneering fundamental RL research for large language models
  • building scalable RL infrastructure and training methodologies
  • enhancing model reasoning capabilities