Researcher, Automated Red Teaming

OpenAI OpenAI · AI Frontier · San Francisco, CA · Safety Systems

This role leads the Automated Red Teaming (ART) effort, focusing on building scalable, research-driven systems to uncover failure modes in AI models and safeguards. The goal is to translate these findings into actionable improvements and reduce expected harm by identifying weaknesses early and reliably. The role involves research into automated classifier jailbreak discovery, bio threat-development elicitation, and CoT monitoring evasion probing, with a strong emphasis on applied research, evaluations, and building scalable automation.

What you'd actually do

  1. Own the research and technical direction for automated red teaming across catastrophic risk areas, with an initial emphasis on: - Automated classifier jailbreak discovery (cyber and bio). - Automated bio threat-development elicitation (worst-feasible planning uplift). - CoT monitoring evasion probing (and adjacent loss-of-control evaluations).
  2. Partner closely with: - Vertical risk teams (Cyber, Bio, Loss of Control) to define threat models, prioritize targets, and land mitigations. - The Classifiers team to turn discovered attacks into training data, evals, and measurable robustness gains. - Product / Engineering / Safety stakeholders to ensure ART outputs are operationally useful.

Skills

Required

  • applied research instincts
  • designing experiments that are reproducible, interpretable, and hard to fool
  • hands-on experience with LLMs and agents
  • multi-turn behaviors
  • tool use
  • building scalable automation
  • solid software engineering fundamentals
  • data structures
  • algorithms
  • testing discipline
  • work effectively in a production-adjacent environment
  • think in threat models and incentives
  • translate messy findings into action
  • communicating clearly with researchers, engineers, product, and policy
  • driving alignment on what to fix first
  • efficiency and prioritization

Nice to have

  • Experience in adversarial ML
  • security research / red teaming
  • abuse prevention systems
  • large-scale eval infrastructure

What the JD emphasized

  • catastrophic risk
  • AI safety
  • reducing real-world catastrophic risk
  • automated red teaming
  • failure modes
  • safeguards
  • frontier models
  • AGI preparedness
  • catastrophic risks related to frontier AI systems
  • automated classifier jailbreak discovery
  • automated bio threat-development elicitation
  • CoT monitoring evasion probing
  • loss-of-control evaluations
  • adversarial ML
  • security research / red teaming
  • abuse prevention systems
  • large-scale eval infrastructure

Other signals

  • building scalable, research-driven systems
  • continuously uncover failure modes
  • translate findings into actionable, production-facing improvements
  • reduce expected harm by finding the highest-leverage, least-covered weaknesses early and reliably