Research Engineer, Frontier Red Team (rsp Evaluations)

Anthropic Anthropic · AI Frontier · AI Policy & Societal Impacts

Research Engineer focused on developing and running "gold standard" evaluations for catastrophic risks to ensure safe release of frontier AI models, aligning with the Responsible Scaling Policy (RSP). The role involves creating evaluation systems, collaborating with domain experts, building sandboxed testing environments, and informing critical deployment decisions.

What you'd actually do

  1. Design and implement robust evaluation infrastructure to measure model capabilities and risks across multiple domains
  2. Lead technical projects to build and scale evaluation systems that could become industry standards
  3. Collaborate with domain experts to translate their insights into concrete evaluation frameworks
  4. Build sandboxed testing environments and automated pipelines for continuous model assessment
  5. Work closely with researchers to rapidly prototype and iterate on new evaluation approaches

Skills

Required

  • Python
  • software engineering skills
  • fast, iterative experiments with frontier AI models
  • designed or implemented evaluations that involve sampling + prompting LLMs
  • clean, well-structured code
  • distributed systems
  • defining technical specifications
  • executing towards them
  • self-starter
  • fast-paced, collaborative environments
  • tackling unprecedented technical challenges
  • balance the urgency of our mission with careful, methodical implementation

Nice to have

  • Experience working on sensitive or security-critical projects
  • Understanding of AI safety concepts and concerns
  • Background in one or more relevant domains (biosecurity, cybersecurity, and others)

What the JD emphasized

  • Responsible Scaling Policy (RSP)
  • dangerous capabilities
  • ASL thresholds
  • evaluation systems
  • evaluation infrastructure
  • evaluation frameworks
  • evaluation approaches
  • model evaluations
  • evaluations

Other signals

  • Responsible Scaling Policy (RSP)
  • evaluate dangerous capabilities in models
  • determine if and when they cross ASL thresholds
  • establish standards that could influence the entire AI industry