Ml/research Engineer, Safeguards

Anthropic Anthropic · AI Frontier · San Francisco, CA · AI Research & Engineering

ML/Research Engineer focused on detecting and mitigating misuse of AI systems, building classifiers, monitoring for harms, evaluating agentic product safety, and conducting research on red-teaming and adversarial robustness.

What you'd actually do

  1. Develop classifiers to detect misuse and anomalous behavior at scale. This includes developing synthetic data pipelines for training classifiers and methods to automatically source representative evaluations to iterate on
  2. Build systems to monitor for harms that span multiple exchanges, such as coordinated cyber attacks and influence operations, and develop new methods for aggregating and analyzing signals across contexts
  3. Evaluate and improve the safety of agentic products—developing both threat models and environments to test for agentic risks, and developing and deploying mitigations for prompt injection attacks
  4. Conduct research on automated red-teaming, adversarial robustness, and other research that helps test for or find misuse

Skills

Required

  • Python
  • building ML systems
  • exploratory experiments
  • production systems
  • communication skills

Nice to have

  • Language modeling and transformers
  • Building classifiers, anomaly detection systems, or behavioral ML
  • Adversarial machine learning or red-teaming
  • Interpretability or probes
  • Reinforcement learning
  • High-performance, large-scale ML systems

What the JD emphasized

  • 4+ years of experience in ML engineering, research engineering, or applied research
  • Are worried about misuse risks of AI systems, and want to work to mitigate them

Other signals

  • Develop classifiers to detect misuse and anomalous behavior at scale
  • Build systems to monitor for harms that span multiple exchanges
  • Evaluate and improve the safety of agentic products
  • Conduct research on automated red-teaming, adversarial robustness