Research Scientist, Agent Robustness

Scale AI Scale AI · Data AI · San Francisco, CA · Research

Research Scientist focused on agent robustness, AI safety, and risk evaluations. The role involves researching AI agent capabilities, designing tests for harmful actions, creating exploits and mitigations for failure modes, and characterizing risks in multi-agent systems. Experience with post-training techniques like RLHF and published research in generative AI is required.

What you'd actually do

  1. Research the science of AI agent capabilities with a focus on how they relate to safety, risk factors, and methodologies for benchmarking them;
  2. Design and build harnesses to test AI agents’ tendency to take harmful actions when pressured to do so by users or tricked into doing so by elements of their environment;
  3. Design and build exploits and mitigations for new and unique failure modes that arise as AI agents gain affordances like coding, web browsing, and computer use;
  4. Characterize and design mitigations for potential failure modes or broader risks of systems involving multiple interacting AI agents.

Skills

Required

  • technical research
  • agent scaffolding
  • evaluation harnesses
  • working prototypes
  • post-training
  • RLHF
  • DPO
  • GRPO
  • published research in machine learning
  • generative AI
  • ML prototyping
  • debugging

Nice to have

  • agent evaluation frameworks
  • SWE-bench
  • WebArena
  • OSWorld
  • Inspect
  • red-teaming
  • prompt injection
  • adversarial testing

What the JD emphasized

  • fundamental challenges of building AI agents that are safe and aligned with humans
  • harmful actions
  • failure modes
  • multiple interacting AI agents
  • post-training
  • RLHF
  • published research in machine learning
  • sophisticated ML problems

Other signals

  • research
  • agent robustness
  • AI safety
  • risk evaluations
  • benchmarking
  • failure modes
  • exploits
  • mitigations
  • post-training
  • RLHF
  • published research
  • generative AI
  • ML prototyping
  • debugging