Safeguards Enforcement Analyst, Safety Evaluations

Anthropic Anthropic · AI Frontier · United States · Remote · Safeguards (Trust & Safety)

This role focuses on evaluating AI models against safety and policy standards, running and monitoring evaluations, driving mitigations, and coordinating the creation of new evaluation frameworks. It involves cross-functional collaboration with policy experts and engineering teams to ensure model behavior meets required standards and to build scalable processes for evaluation.

What you'd actually do

  1. Support model launch readiness by running evaluations, monitoring and interpreting results, and surfacing regressions or unexpected behavior changes to relevant stakeholders
  2. Partner closely with policy and domain experts throughout the evaluation lifecycle — from identifying risks and scoping the right evaluation approach, to coordinating creation of new evals and ensuring existing ones remain current with evolving policies, threat vectors, and model capabilities
  3. Work with cross-functional stakeholders to help manage evaluation outcomes, including interpreting results and driving mitigations where needed
  4. Think strategically about eval quality to build processes and eval paradigms that keep evaluations unsaturated, high-signal, and insightful as models improve
  5. Build out processes and frameworks for creating product-specific evaluations as Anthropic's product surface area expands

Skills

Required

  • Experience in trust and safety, content operations, policy enforcement, or a related operational role at a technology company
  • Experience building processes, workflows, or programs from scratch
  • Strong program management instincts
  • Ability to manage multiple concurrent workstreams across different domain areas
  • Strong prioritization and context-switching skills
  • Clear and concise communication (written and cross-functional)

Nice to have

  • Experience operating under tight, high-stakes timelines
  • Experience coordinating across engineering, policy, and product teams
  • Experience building and maintaining SOPs, runbooks, and operational documentation
  • Proficiency with data tools (SQL, dashboards, spreadsheets)
  • Comfort working with sensitive content areas

What the JD emphasized

  • enforcing our policies
  • protecting users
  • ensuring our platform is not misused
  • run and monitor evaluations
  • drive mitigations when issues surface
  • coordinate the creation of new evals
  • build the processes and documentation that allow the team to scale this work over time
  • detail-oriented
  • comfortable navigating ambiguity
  • coordinating across teams
  • break new ground
  • drive work to completion
  • deeply cross-functional
  • ensure our evaluations are comprehensive and current
  • findings translate into meaningful improvements to model behavior
  • Support model launch readiness
  • running evaluations
  • monitoring and interpreting results
  • surfacing regressions or unexpected behavior changes
  • Partner closely with policy and domain experts
  • identifying risks
  • scoping the right evaluation approach
  • coordinating creation of new evals
  • ensuring existing ones remain current
  • evolving policies
  • threat vectors
  • model capabilities
  • manage evaluation outcomes
  • interpreting results
  • driving mitigations
  • Think strategically about eval quality
  • build processes and eval paradigms
  • evaluations unsaturated
  • high-signal
  • insightful
  • models improve
  • Build out processes and frameworks
  • creating product-specific evaluations
  • Anthropic's product surface area expands
  • design and scope tooling improvements
  • accommodate evolving eval needs
  • expand self-serve eval creation and iteration
  • non-technical users
  • Write and maintain rigorous documentation
  • evaluation creation
  • execution
  • interpretation
  • team builds out eval tooling and processes
  • experience in trust and safety
  • content operations
  • policy enforcement
  • related operational role
  • technology company
  • Thrive in ambiguous, fast-moving environments
  • energized rather than frustrated
  • path forward isn't clearly defined
  • figure it out as you go
  • experience building processes, workflows, or programs from scratch
  • zero-to-one work
  • not just maintaining existing ones
  • strong program management instincts
  • creating structure around complex, multi-stakeholder efforts
  • tracking timelines
  • dependencies
  • deliverables
  • keep work on track
  • eager to expand your technical toolkit
  • adopting internal tools
  • AI-assisted workflows
  • Claude Code
  • accelerate your work
  • manage multiple concurrent workstreams
  • different domain areas
  • without losing track of details
  • strong prioritization
  • context-switching are essential
  • deadlines and priorities shift quickly
  • strong generalist
  • comfortable moving fluidly across different types of work
  • switching contexts throughout the day
  • comfortable making judgment calls with incomplete information
  • escalating appropriately
  • Communicate clearly and concisely
  • writing and cross-functionally
  • Experience operating under tight, high-stakes timelines
  • product launch cycles
  • incident response
  • regulatory deadlines
  • information and priorities can shift with little notice
  • Experience coordinating across engineering, policy, and product teams
  • translate findings into concrete action
  • Experience building and maintaining SOPs, runbooks, and operational documentation
  • fast-changing environments
  • Proficiency with data tools
  • SQL
  • dashboards
  • spreadsheets
  • maintain and improve workflows
  • Comfort working with sensitive content areas
  • eval creation
  • enforcement review responsibilities

Other signals

  • evaluating model safety and policy standards
  • running and monitoring evaluations
  • driving mitigations when issues surface
  • coordinating the creation of new evals
  • building processes and documentation to scale work