Research Engineer, Domain Scaling

Anthropic Anthropic · AI Frontier · San Francisco, CA · AI Research & Engineering

Research Engineer focused on scaling AI models for real-world knowledge work in domains like finance, healthcare, and legal. This role involves owning the end-to-end data strategy, from sourcing tasks to RL training, including designing reward signals, managing external data vendors, and developing QA frameworks to ensure environment quality and prevent reward hacking. It combines applied research with hands-on data work.

What you'd actually do

  1. Own the data strategy for knowledge work verticals end-to-end, from task sourcing through RL training
  2. Manage technical relationships with external data vendors, including evaluation of data quality and reward design
  3. Collaborate with domain experts to design data pipelines and evaluations
  4. Explore novel ways of creating RL envs for high value tasks
  5. Develop and improve QA frameworks to catch reward hacking and ensure env quality

Skills

Required

  • experience with fine-tuning large language models for specific domains or real-world use cases
  • experience with reinforcement learning, reward design, or training data curation for LLMs
  • comfortable managing technical vendor relationships and iterating quickly on feedback
  • strong cross-functional collaboration skills

Nice to have

  • experience training production ML systems
  • experience designing evals or benchmarks for LLMs
  • domain expertise in a vertical where we would like to make our models more useful
  • experience working with external vendors or technical partners

What the JD emphasized

  • end-to-end process of creating RL environments
  • identifying high-value tasks
  • designing reward signals
  • managing vendor relationships
  • measuring impact on model performance
  • fine-tuning large language models for specific domains or real-world use cases
  • reinforcement learning, reward design, or training data curation for LLMs
  • managing technical vendor relationships
  • designing evals or benchmarks for LLMs

Other signals

  • end-to-end data strategy for knowledge work verticals
  • designing reward signals
  • managing vendor relationships for data sourcing
  • exploring novel ways of creating RL environments
  • developing and improving QA frameworks for reward hacking