Research Engineer, Interpretability

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Research Engineer focused on mechanistic interpretability to understand and improve the safety of large language models. This involves implementing and analyzing experiments, optimizing research workflows, building tools for experimentation, and developing infrastructure to support model safety improvements.

What you'd actually do

  1. Implement and analyze research experiments, both quickly in toy scenarios and at scale in large models
  2. Set up and optimize research workflows to run efficiently and reliably at large scale
  3. Build tools and abstractions to support rapid pace of research experimentation
  4. Develop and improve tools and infrastructure to support other teams in using Interpretability’s work to improve model safety

Skills

Required

  • Python
  • Rust
  • Go
  • Java
  • empirical AI research projects
  • prioritize and direct effort
  • comfortable operating with ambiguity
  • questioning assumptions

Nice to have

  • Designing a code base so that anyone can quickly code experiments, launch them, and analyze their results without hitting bugs
  • Optimizing the performance of large-scale distributed systems
  • Collaborating closely with researchers
  • Language modeling with transformers
  • GPUs
  • Pytorch

What the JD emphasized

  • mechanistic interpretability
  • reverse engineer how trained models work
  • extract millions of meaningful features
  • change the model’s behavior
  • improve the safety of LLMs

Other signals

  • mechanistic interpretability
  • reverse engineer models
  • extract features from production models
  • change model behavior
  • improve safety of LLMs