Research Scientist, Interpretability

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Research Scientist focused on mechanistic interpretability of LLMs, aiming to understand how neural network parameters map to algorithms for safety and steerability. Involves developing methods, running experiments, building infrastructure, and communicating results.

What you'd actually do

  1. Develop methods for understanding LLMs by reverse engineering algorithms learned in their weights
  2. Design and run robust experiments, both quickly in toy scenarios and at scale in large models
  3. Build infrastructure for running experiments and visualizing results
  4. Work with colleagues to communicate results internally and publicly

Skills

Required

  • Python
  • scientific research
  • interpretability

Nice to have

  • team science
  • messy experimental science
  • research and engineering as two sides of the same coin
  • communicating results

What the JD emphasized

  • mechanistic interpretability
  • strong track record of scientific research
  • some work on Interpretability

Other signals

  • mechanistic interpretability
  • reverse-engineer algorithms
  • understanding LLMs