Research Engineer / Scientist, Alignment Science

Anthropic Anthropic · AI Frontier · San Francisco, CA · AI Research & Engineering

Research Engineer/Scientist focused on AI safety and alignment, conducting experiments to understand and steer AI behavior, with a focus on risks from powerful future systems. Involves collaboration with interpretability, fine-tuning, and red teaming teams. Explores scalable oversight, AI control, stress-testing, automated alignment research, alignment assessments, safeguards research, and model welfare.

What you'd actually do

  1. Contribute to exploratory experimental research on AI safety, with a focus on risks from powerful future systems
  2. Build and run elegant and thorough machine learning experiments to help us understand and steer the behavior of powerful AI systems
  3. Develop techniques to keep highly capable models helpful and honest, even as they surpass human-level intelligence in various domains
  4. Create methods to ensure advanced AI systems remain safe and harmless in unfamiliar or adversarial scenarios
  5. Build and align a system that can speed up & improve alignment research

Skills

Required

  • significant software, ML, or research engineering experience
  • experience contributing to empirical AI research projects
  • familiarity with technical AI safety research
  • Python

Nice to have

  • authoring research papers in machine learning, NLP, or AI safety
  • experience with LLMs
  • experience with reinforcement learning
  • experience with Kubernetes clusters and complex shared codebases

What the JD emphasized

  • AI safety
  • alignment
  • powerful future systems
  • human-level capabilities
  • AI safety research
  • LLM
  • reinforcement learning
  • Python

Other signals

  • AI safety research
  • alignment
  • evaluating AI systems
  • large language models