Safeguards Analyst, Human Exploitation & Abuse

Anthropic Anthropic · AI Frontier · United States · Remote · Safeguards (Trust & Safety)

This role focuses on building and operating enforcement systems to detect and mitigate the misuse of AI products for human exploitation and abuse. It involves tuning classifiers, curating evaluation datasets, conducting investigations using data analysis tools, and collaborating with product and engineering teams to develop detection signals and mitigations. The role also involves external partnerships and staying ahead of evolving misuse tactics.

What you'd actually do

  1. Design and architect automated enforcement systems and review workflows for human exploitation and abuse, ensuring they scale effectively while maintaining high accuracy
  2. Partner with Product, Engineering, and Data Science teams to build and tune detection signals for human trafficking, sextortion, and image-based sexual abuse, and to develop custom mitigations for these sensitive policy areas
  3. Curate policy violation examples, maintain golden evaluation datasets, and track enforcement actions across both consumer and API surfaces
  4. Conduct deep-dive investigations into suspected exploitation activity — using SQL and other data analysis tools to surface threat patterns and bad-actor behavior in large datasets — then produce clear, well-sourced intelligence reports that inform detection strategy and surface policy gaps to the Safeguards policy design team
  5. Study trends internally and in the broader ecosystem — including evolving trafficking and sextortion tactics — to anticipate how AI systems could be misused for exploitation as capabilities advance

Skills

Required

  • 3+ years of experience in trust and safety, content moderation, counter-exploitation work, or a related field
  • Subject matter expertise in one or more of: human trafficking, human exploitation and abuse, sextortion, image-based sexual abuse / non-consensual intimate imagery, or commercial sexual exploitation
  • Experience building or operating detection and review workflows for sensitive content, at a platform, NGO, hotline, or similar organization
  • Ability to use SQL, Python, and/or other data analysis tools to interact with large datasets and derive insights that support key decisions and recommendations
  • Demonstrated ability to analyze complex situations and make well-reasoned decisions under pressure
  • Sound judgment in distinguishing permitted content from exploitative content, and comfort working in areas where these lines require careful reasoning
  • Strong attention to detail and ability to maintain accurate documentation
  • Ability to collaborate with team members while navigating rapidly evolving priorities and workstreams

Nice to have

  • Familiarity with the NGO and industry ecosystem working on these harms (for example, Polaris Project, Thorn, NCMEC, IWF, StopNCII, or industry hash-sharing initiatives)
  • Experience conducting open-source investigations or threat actor profiling in a trust & safety, intelligence, or law enforcement context
  • Experience working with generative AI products, including writing effective prompts for content review and enforcement
  • A deep interest in AI safety and responsible technology development
  • Experience standing up real-world harm escalation pathways or working with law enforcement referral processes

What the JD emphasized

  • Subject matter expertise in one or more of: human trafficking, human exploitation and abuse, sextortion, image-based sexual abuse / non-consensual intimate imagery, or commercial sexual exploitation
  • Experience building or operating detection and review workflows for sensitive content, at a platform, NGO, hotline, or similar organization
  • Ability to use SQL, Python, and/or other data analysis tools to interact with large datasets and derive insights that support key decisions and recommendations
  • Sound judgment in distinguishing permitted content from exploitative content, and comfort working in areas where these lines require careful reasoning

Other signals

  • tuning classifiers
  • curating evaluation datasets
  • detection signals
  • custom mitigations
  • intelligence reports