Research Engineer / Scientist, Alignment Science - London

Anthropic Anthropic · AI Frontier · London, United Kingdom · AI Research & Engineering

Research Engineer/Scientist focused on AI safety and alignment, conducting experimental research to understand and steer the behavior of powerful AI systems. The role involves testing robustness of safety techniques, running multi-agent RL experiments, building tooling for evaluating jailbreaks, and contributing to research papers. Collaboration with Interpretability, Fine-Tuning, and Frontier Red Team is expected.

What you'd actually do

  1. Build and run elegant and thorough machine learning experiments to help us understand and steer the behavior of powerful AI systems.
  2. Contribute to exploratory experimental research on AI safety, with a focus on risks from powerful future systems
  3. Run multi-agent reinforcement learning experiments to test out techniques like AI Debate.
  4. Build tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks.
  5. Write scripts and prompts to efficiently produce evaluation questions to test models’ reasoning abilities in safety-relevant contexts.

Skills

Required

  • Software engineering experience
  • ML experience
  • Research engineering experience
  • Empirical AI research projects
  • Technical AI safety research
  • Python

Nice to have

  • Authoring research papers in machine learning, NLP, or AI safety
  • LLMs experience
  • Reinforcement learning experience
  • Kubernetes clusters experience
  • Complex shared codebases experience

What the JD emphasized

  • travel to San Francisco occasionally
  • all interviews in Python
  • AI safety

Other signals

  • AI safety research
  • steering AI behavior
  • understanding AI risks
  • experimental research