Research Engineer, Interpretability

Anthropic Anthropic · AI Frontier · San Francisco, CA · AI Research & Engineering

Research Engineer focused on building and maintaining specialized infrastructure for interpretability research in AI systems. This involves developing tools for model analysis, optimizing training and inference pipelines, and ensuring reliability for safety audits, with a strong emphasis on understanding and controlling model behavior.

What you'd actually do

  1. Build and maintain the specialized inference and training infrastructure that powers interpretability research - including instrumented forward/backward passes, activation extraction, and steering vector application
  2. Resolve scaling and efficiency bottlenecks through profiling, optimization, and close collaboration with peer infrastructure teams
  3. Design tools, abstractions, and platforms that enable researchers to rapidly experiment without hitting engineering barriers
  4. Help bring interpretability research into production safety audits - with real deadlines and high reliability expectations
  5. Work across the stack - from model internals and accelerator-level optimization to user-facing research tooling

Skills

Required

  • 5-10+ years of experience building software
  • Highly proficient in at least one programming language (e.g., Python, Rust, Go, Java)
  • Productive with Python
  • Extremely curious about unfamiliar domains
  • Strong ability to prioritize the most impactful work
  • Comfortable operating with ambiguity and questioning assumptions
  • Prefer fast-moving collaborative projects
  • Care about the societal impacts and ethics of your work
  • Comfortable working closely with researchers, translating research needs into engineering solutions

Nice to have

  • Optimizing the performance of large-scale distributed systems
  • Language modeling fundamentals with transformers
  • High Performance LLM optimization: memory management, compute efficiency, parallelism strategies, inference throughput optimization
  • Working hands-on in a mainstream ML stack - PyTorch/CUDA on GPUs or JAX/XLA on TPUs
  • Collaborating closely with researchers and building tooling to support research teams
  • Directly performed research with complex engineering challenges

What the JD emphasized

  • mechanistic understanding is the most robust way to make advanced systems safe
  • engineering and infrastructure have become a bottleneck
  • real deadlines and high reliability expectations

Other signals

  • interpretability research
  • AI safety
  • reverse-engineering neural networks
  • specialized inference and training infrastructure