Anthropic Fellows Program — AI Safety

Anthropic Anthropic · AI Frontier · BC +3 · Remote · AI Research & Engineering

This is a research fellowship program focused on AI safety, aiming to foster talent in empirical AI research. Fellows will work on projects aligned with Anthropic's research priorities, using external infrastructure and external models, with the goal of producing public outputs like paper submissions. Key research areas include Scalable Oversight, Adversarial Robustness and AI Control, Model Organisms, Model Internals / Mechanistic Interpretability, and AI Welfare.

What you'd actually do

  1. 4 months of full-time research
  2. Direct mentorship from Anthropic researchers
  3. Access to a shared workspace (in either Berkeley, California or London, UK)
  4. Connection to the broader AI safety and security research community
  5. Weekly stipend of 3,850 USD / 2,310 GBP / 4,300 CAD + benefits (these vary by country)

Skills

Required

  • Python programming

Nice to have

  • Strong technical background in computer science, mathematics, or physics
  • Experience in areas of research or engineering related to their workstream
  • Strong background in a discipline relevant to a specific Fellows workstream (e.g. economics, social sciences, or cybersecurity)

What the JD emphasized

  • public output
  • paper submission
  • Fluent in Python programming
  • Available to work full-time on the Fellows program

Other signals

  • AI Safety
  • Scalable Oversight
  • Adversarial Robustness
  • Model Internals
  • Mechanistic Interpretability