Software Engineer, Safeguards Infrastructure

Anthropic Anthropic · AI Frontier · London, United Kingdom · Safeguards (Trust & Safety)

Software Engineer focused on building foundational systems for AI safety, including infrastructure for data management, metric and evaluation systems, and tooling for human and agentic review. The role involves ensuring the day-to-day running of Safeguards systems and building robust, reliable multi-layered defenses for real-time improvement of safety mechanisms at scale.

What you'd actually do

  1. Develop the foundational systems which power Safeguards, including infrastructure for data storage and management, metric and evaluation systems, and tooling for human and agentic review.
  2. Ensure the day-to-day running of Safeguards systems and hold a high operational bar which serves both safety and customers while reducing the amount of human intervention and oversight required.
  3. Build robust and reliable multi-layered defenses for real-time improvement of safety mechanisms that work at scale

Skills

Required

  • Python
  • ability to work across the stack
  • strong communication skills
  • explain complex technical concepts to non-technical stakeholders

Nice to have

  • TypeScript
  • Rust
  • experience building trust and safety, anti-spam, fraud or abuse detection and mitigation mechanisms and interventions for AI/ML systems
  • experience building metrics and measurement systems or data and privacy management systems
  • experience with Claude Code or similar agentic coding tools

What the JD emphasized

  • Safeguards
  • safety
  • oversight
  • intervention
  • models
  • misuse
  • user well-being
  • unwanted model behaviors
  • disallowed use
  • principles of safety
  • transparency
  • oversight
  • foundational systems
  • Safeguards
  • data storage and management
  • metric and evaluation systems
  • tooling for human and agentic review
  • Safeguards systems
  • operational bar
  • safety
  • customers
  • human intervention and oversight
  • robust and reliable multi-layered defenses
  • real-time improvement of safety mechanisms
  • scale
  • trust and safety
  • anti-spam
  • fraud or abuse detection and mitigation mechanisms and interventions for AI/ML systems
  • metrics and measurement systems
  • data and privacy management systems
  • operational teams
  • custom internal tooling
  • agentic coding tools

Other signals

  • building foundational systems for safety
  • monitoring models
  • preventing misuse
  • detecting unwanted model behaviors
  • building multi-layered defenses for real-time improvement of safety mechanisms