Model Behavior Architect, Alignment Finetuning

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Role focused on shaping AI system behavior for alignment with human values through prompt engineering, data generation, and rigorous testing. Involves evaluating model judgment in domains like honesty, character, and ethics, and collaborating with research teams. Requires experience in prompt engineering, AI output evaluation, and understanding of LLM training/RL concepts.

What you'd actually do

  1. Design and implement subtle prompting strategies and data generation pipelines that improve model responses
  2. Identify and fix edge case behaviors through rigorous testing of your data generation pipelines
  3. Interact with models to carefully identify where model behavior and judgment can be improved
  4. Gather internal and external feedback on model behavior to document areas for improvement
  5. Develop evaluations of language model behaviors across judgment-based domains like honesty, character, and ethics

Skills

Required

  • prompt engineering
  • evaluating AI system outputs
  • Python
  • running basic scripts
  • identifying subtle issues in AI outputs
  • understanding how LLMs are trained
  • familiarity with reinforcement learning concepts
  • finetuning large language models
  • test-driven development
  • analyzing data and data pipelines

Nice to have

  • formal training in ethics or moral philosophy or moral psychology
  • data science with emphasis on data verification
  • conceptual understanding of language model training and finetuning techniques
  • experience developing evaluation frameworks for large language models
  • background in AI safety research
  • experience with RLHF
  • experience with constitutional AI
  • experience with other alignment techniques
  • published work related to AI ethics or safety
  • knowledge of model behavior benchmarking

What the JD emphasized

  • extensive experience with prompt engineering
  • strong skills in evaluating AI system outputs
  • keen eye for identifying subtle issues in AI outputs
  • experience finetuning large language models
  • formal training in ethics or moral philosophy or moral psychology
  • conceptual understanding of language model training and finetuning techniques
  • experience developing evaluation frameworks for large language models
  • published work related to AI ethics or safety
  • knowledge of model behavior benchmarking

Other signals

  • model alignment
  • AI safety
  • prompt engineering
  • model evaluation