Anthropic AI Safety Fellow, Canada

Anthropic Anthropic · AI Frontier · AI Research & Engineering

This is a fellowship program focused on AI safety research, aiming to bridge industry engineering expertise with research skills. Fellows will work on empirical projects using external infrastructure, with the goal of producing public outputs like paper submissions. The program offers mentorship, funding, and compute resources.

What you'd actually do

  1. Fellows will use external infrastructure (e.g. open-source models, public APIs) to work on an empirical project aligned with our research priorities, with the goal of producing a public output (e.g. a paper submission).
  2. Fellows will receive substantial support - including mentorship from Anthropic researchers, funding, compute resources, and access to a shared workspace - enabling them to develop the skills to contribute meaningfully to critical AI safety research.
  3. Our mentors will lead projects in select AI safety research areas, such as: Scalable Oversight, Adversarial Robustness and AI Control, Model Organisms, Model Internals / Mechanistic Interpretability, AI Welfare.

Skills

Required

  • strong technical background in computer science, mathematics, physics, or related fields
  • strong programming skills, particularly in Python and machine learning frameworks
  • comfortable programming in Python
  • thrive in fast-paced, collaborative environments
  • execute projects independently while incorporating feedback on research direction

Nice to have

  • experience with empirical ML research projects
  • experience working with Large Language Models
  • experience in one of the research areas (e.g. Interpretability)
  • experience with deep learning

What the JD emphasized

  • AI safety research
  • empirical project
  • paper submission

Other signals

  • AI safety research
  • empirical project
  • paper submission