Research Engineer, Machine Learning (reinforcement Learning)

Anthropic Anthropic · AI Frontier · London, United Kingdom · AI Research & Engineering

Research Engineer focused on Reinforcement Learning to advance capabilities and safety of large language models. This role involves implementing novel approaches, contributing to research direction, and creating agentic models for tasks like computer use and autonomous software generation, while also improving reasoning abilities and developing prototypes. Key responsibilities include architecting RL infrastructure, designing training environments and methodologies, driving performance improvements, and collaborating across teams.

What you'd actually do

  1. Architect and optimize core reinforcement learning infrastructure, from clean training abstractions to distributed experiment management across GPU clusters. Help scale our systems to handle increasingly complex research workflows.
  2. Design, implement, and test novel training environments, evaluations, and methodologies for reinforcement learning agents which push the state of the art for the next generation of models.
  3. Drive performance improvements across our stack through profiling, optimization, and benchmarking. Implement efficient caching solutions and debug distributed systems to accelerate both training and evaluation workflows.
  4. Collaborate across research and engineering teams to develop automated testing frameworks, design clean APIs, and build scalable infrastructure that accelerates AI research.

Skills

Required

  • Python
  • async/concurrent programming
  • Trio
  • machine learning frameworks (PyTorch, TensorFlow, JAX)
  • industry experience in machine learning research
  • research exploration
  • engineering implementation
  • code quality
  • testing
  • performance
  • systems design
  • communication skills

Nice to have

  • LLM architectures
  • LLM training methodologies
  • reinforcement learning techniques
  • reinforcement learning environments
  • virtualization
  • sandboxed code execution environments
  • Kubernetes
  • distributed systems
  • high-performance computing
  • Rust
  • C++

What the JD emphasized

  • reinforcement learning
  • agentic
  • tool use
  • autonomous software generation
  • reasoning abilities
  • RL infrastructure
  • training methodologies
  • evaluations
  • distributed systems

Other signals

  • developing systems that enable models to use computers effectively
  • advancing code generation through reinforcement learning
  • pioneering fundamental RL research for large language models
  • building scalable RL infrastructure and training methodologies
  • enhancing model reasoning capabilities
  • creating 'agentic' models via tool use for open-ended tasks such as computer use and autonomous software generation
  • improving reasoning abilities in areas such as mathematics
  • developing prototypes for internal use, productivity, and evaluation