Researcher, Alignment Cot Monitorability

OpenAI OpenAI · AI Frontier · San Francisco, CA · Research

Researcher focused on studying and improving the monitorability of chain-of-thought reasoning in frontier AI models, with applications in alignment and safety. The role involves designing and running empirical studies, building evaluations, analyzing model behavior, and translating findings into practical oversight and training recommendations.

What you'd actually do

  1. Design and run empirical studies of chain-of-thought monitorability across frontier reasoning models and training settings.
  2. Build evaluations that measure whether monitors can reliably predict properties of interest, including high-stakes forms of misbehavior.
  3. Investigate how pre-training, synthetic data, mid-training, post-training, reinforcement learning, and other interventions improve or degrade monitorability.
  4. Analyze model behavior and turn observations from monitoring into hypotheses, experiments, and recommendations.
  5. Translate research findings into practical monitoring and oversight approaches that can inform real training runs.

Skills

Required

  • Strong empirical ML expertise
  • Deep interest in model behavior, alignment, or interpretability
  • Ability to design and run empirical studies
  • Ability to build evaluations
  • Ability to analyze model behavior
  • Ability to translate findings into practical recommendations
  • Experience training, evaluating, or debugging large ML models, especially LLMs

Nice to have

  • Direct chain-of-thought interpretability experience
  • Experience with reward functions, environments, or training interventions
  • Experience with externally publishable research

What the JD emphasized

  • monitorability
  • alignment
  • interpretability
  • model behavior
  • scalable oversight
  • chain-of-thought

Other signals

  • alignment
  • interpretability
  • model behavior
  • monitorability
  • scalable oversight