Research Scientist, Interpretability

Anthropic Anthropic · AI Frontier · San Francisco, CA · AI Research & Engineering

Research Scientist focused on mechanistic interpretability of LLMs, aiming to understand how trained models work by reverse-engineering their parameters and algorithms. The role involves developing methods, designing experiments, creating interpretability features, building infrastructure, and collaborating with other teams. Requires strong scientific research background with some interpretability work, comfort with experimental science, and proficiency in Python.

What you'd actually do

  1. Develop methods for understanding LLMs by reverse engineering algorithms learned in their weights
  2. Design and run robust experiments, both quickly in toy scenarios and at scale in large models
  3. Create and analyze new interpretability features and circuits to better understand how models work.
  4. Build infrastructure for running experiments and visualizing results
  5. Work with colleagues to communicate results internally and publicly

Skills

Required

  • Python
  • scientific research
  • experimental design
  • data analysis
  • model interpretability

Nice to have

  • LLM understanding
  • reverse engineering
  • circuit analysis
  • experiment infrastructure
  • scientific communication

What the JD emphasized

  • mechanistic interpretability
  • interpretability

Other signals

  • mechanistic interpretability
  • reverse-engineer algorithms
  • understand model behavior