Research Scientist, AI Controls and Monitoring

Scale AI Scale AI · Data AI · San Francisco, CA · Research

Research Scientist role focused on designing methods, systems, and experiments for AI controls and monitoring, ensuring advanced AI models and agents remain aligned with intended goals, even in high-stakes or adversarial environments. This includes developing monitoring techniques, researching layered control mechanisms, designing red-team simulations, and collaborating with policymakers.

What you'd actually do

  1. Develop monitoring techniques and observability methods that track AI behavior in real time to identify and flag deviations, emergent capabilities, or anomalous outputs
  2. Research mechanisms for layered control, including fail-safes, oversight protocols, and intervention methods that can halt or redirect AI systems when risks are detected
  3. Design red-team simulations to probe weaknesses in oversight and control mechanisms, and build mitigations to close identified gaps
  4. Collaborate with policymakers, engineers, and other researchers to establish standards and benchmarks for AI monitoring and escalation

Skills

Required

  • designing methods, systems, and experiments for AI controls and monitoring
  • ensure advanced AI models and agents remain aligned with intended goals
  • track AI behavior in real time
  • identify and flag deviations, emergent capabilities, or anomalous outputs
  • research mechanisms for layered control
  • fail-safes, oversight protocols, and intervention methods
  • design red-team simulations
  • build mitigations
  • collaborate with policymakers, engineers, and other researchers
  • establish standards and benchmarks for AI monitoring and escalation
  • practical experience conducting technical research collaboratively
  • designing control and monitoring experiments for AI systems
  • building prototype systems
  • turning new ideas from the research literature into working prototypes
  • track record of published research in machine learning
  • generative AI
  • addressing sophisticated ML problems
  • strong written and verbal communication skills

Nice to have

  • runtime monitoring
  • anomaly detection
  • observability for ML systems
  • AI control or alignment research
  • scalable oversight
  • interpretability
  • debate
  • post-training and RL techniques such as RLHF, DPO, GRPO, and similar approaches

What the JD emphasized

  • AI Controls and Monitoring
  • AI risk evaluations
  • agent robustness
  • AI control protocols
  • AI risk
  • AI models and agents remain aligned with intended goals
  • high-stakes or adversarial environments
  • monitoring techniques
  • observability methods
  • anomalous outputs
  • layered control
  • fail-safes
  • oversight protocols
  • intervention methods
  • halt or redirect AI systems
  • red-team simulations
  • oversight and control mechanisms
  • mitigations
  • standards and benchmarks for AI monitoring and escalation
  • technical research collaboratively
  • control and monitoring experiments for AI systems
  • working prototypes
  • published research in machine learning
  • generative AI
  • sophisticated ML problems
  • runtime monitoring
  • anomaly detection
  • observability for ML systems
  • AI control or alignment research
  • scalable oversight
  • interpretability
  • debate
  • post-training and RL techniques
  • RLHF
  • DPO
  • GRPO

Other signals

  • AI controls and monitoring
  • AI risk evaluations
  • agent robustness
  • AI safety