Technical Lead, Safety Research

OpenAI OpenAI · AI Frontier · San Francisco, CA · Safety Systems

This role is a Technical Lead for Safety Research at OpenAI, focusing on advancing AI safety and alignment. The team works on implementing robust, safe behavior in AI models, developing new evaluations for misalignment, and supporting human oversight. The lead will set research directions, drive exploratory research, and collaborate across teams to ensure strong safety results. Key responsibilities include setting research strategies, coordinating with cross-functional teams, evaluating model safety, and conducting research on topics like RLHF, adversarial training, and robustness. The role requires a strong track record in AI safety research, leadership experience, and a deep understanding of deep learning.

What you'd actually do

  1. Set the research directions and strategies to make our AI systems safer, more aligned and more robust.
  2. Coordinate and collaborate with cross-functional teams, including the rest of the research organization, T&S, policy and related alignment teams, to ensure that our AI meets the highest safety standards.
  3. Actively evaluate and understand the safety of our models and systems, identifying areas of risk and proposing mitigation strategies.
  4. Conduct state-of-the-art research on AI safety topics such as RLHF, adversarial training, robustness, and more.
  5. Implement new methods in OpenAI’s core model training and launch safety improvements in OpenAI’s products.

Skills

Required

  • AI safety research
  • RLHF
  • adversarial training
  • robustness
  • fairness & biases
  • deep learning research
  • engineering skills
  • collaboration

Nice to have

  • Ph.D. or other degree in computer science, machine learning, or a related field

What the JD emphasized

  • strong track record of practical research on safety and alignment
  • led large research efforts in the past
  • 4+ years of experience in the field of AI safety
  • safety work for AI model deployment

Other signals

  • advancing capabilities for precisely implementing robust, safe behavior in AI models and systems
  • developing new evaluations to elicit or detect misalignment or inner goals of the AI
  • new methods to support human oversight of long-running tasks
  • implement new methods in OpenAI’s core model training and launch safety improvements in OpenAI’s products