Research Scientist/engineer, Alignment Finetuning

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Research Scientist/Engineer focused on developing and implementing novel finetuning techniques to train language models for better alignment with human values (honesty, character, harmlessness). This involves using synthetic data generation, advanced training pipelines, and creating evaluation frameworks to measure alignment properties. The role also includes integrating improvements into production models and automating/scaling team workflows.

What you'd actually do

  1. Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines
  2. Use these to train models to have better alignment properties including honesty, character, and harmlessness
  3. Create and maintain evaluation frameworks to measure alignment properties in models
  4. Collaborate across teams to integrate alignment improvements into production models
  5. Develop processes to help automate and scale the work of the team

Skills

Required

  • MS/PhD in Computer Science, ML, or related field, or equivalent experience
  • Strong programming skills, especially in Python
  • Experience with ML model training and experimentation
  • Track record of implementing ML research
  • Strong analytical skills for interpreting experimental results
  • Experience with ML metrics and evaluation frameworks
  • Turning research ideas into working code
  • Identify and resolve practical implementation challenges

Nice to have

  • Experience with language model finetuning
  • Background in AI alignment research
  • Published work in ML or alignment
  • Experience with synthetic data generation
  • Familiarity with techniques like RLHF, constitutional AI, and reward modeling
  • Track record of designing and implementing novel training approaches
  • Experience with model behavior evaluation and improvement

What the JD emphasized

  • novel finetuning techniques
  • alignment properties
  • evaluation frameworks

Other signals

  • Develop and implement novel finetuning techniques
  • train models that are more aligned with human values
  • Create and maintain evaluation frameworks to measure alignment properties