Research Engineer, Environment Scaling

Anthropic Anthropic · AI Frontier · United States · Remote · AI Research & Engineering

This role focuses on improving the intelligence of public models by building and managing RL training environments. It involves identifying tasks, designing reward signals, managing external data vendors, and evaluating model performance, combining ML research, data operations, and project management.

What you'd actually do

  1. Improve and execute our fine-tuning strategies for adapting Claude to new domains and tasks
  2. Manage technical relationships with external data vendors, including evaluation of data quality and reward design
  3. Collaborate with domain experts to design data pipelines and evaluations
  4. Explore novel ways of creating RL environments for high value tasks
  5. Develop and improve QA frameworks to catch reward hacking and ensure environment quality

Skills

Required

  • fine-tuning large language models
  • reinforcement learning
  • reward design
  • training data curation for LLMs
  • managing technical vendor relationships
  • reading through datasets to understand them and spot issues
  • ML research
  • data operations
  • project management

Nice to have

  • training production ML systems
  • distributed systems
  • cloud infrastructure
  • domain expertise in an area where we would like to make our models more useful
  • working with external vendors or technical partners

What the JD emphasized

  • improve the intelligence of our public models
  • building the training environments that fuel RL at scale
  • own the end-to-end process of creating RL environments for new capabilities
  • identifying high-value tasks
  • designing reward signals
  • managing vendor relationships
  • measuring impact on model performance
  • fine-tuning large language models
  • reinforcement learning
  • reward design
  • training data curation for LLMs
  • managing technical vendor relationships
  • reading through datasets to understand them and spot issues
  • ML research
  • data operations
  • project management
  • training production ML systems
  • distributed systems
  • cloud infrastructure
  • external vendors
  • technical partners

Other signals

  • improving intelligence of public models
  • building training environments that fuel RL at scale
  • end-to-end process of creating RL environments for new capabilities
  • identifying high-value tasks
  • designing reward signals
  • measuring impact on model performance