Agent Post-training, Connectors Research

OpenAI OpenAI · AI Frontier · San Francisco, CA · Research

Research role focused on post-training frontier agents, specifically teaching models to interface with professional software and tools via code and APIs. The role involves designing experiments, owning the post-training stack (RL, data pipelines, graders, reward signals, evals), building evals and environments, and partnering with product teams to improve model behavior for enterprise applications.

What you'd actually do

  1. Design and run experiments that improve agentic model behavior for complex software and plugins..
  2. Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis.
  3. Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions.
  4. Partner with Codex and ChatGPT product teams to understand what users need and translate product signal into model improvements.
  5. Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior.

Skills

Required

  • strong technical fundamentals in machine learning, software engineering, systems, statistics, or a related field
  • hands-on experience with LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, tool-using agents, or production ML systems
  • ability to move from a vague behavioral problem to a concrete experiment
  • comfortable working across research, product, infrastructure, data, evals, and safety boundaries
  • ability to communicate clearly with each group

Nice to have

  • learn quickly across the parts you have not worked in before
  • excited by open-ended problems where the path is unclear, the signal is noisy, and the right answer requires both research taste and engineering execution
  • care about product impact and model behavior
  • opinions about what makes an agent useful, reliable, honest, tasteful, and easy to work with
  • like building load-bearing systems and processes when that is what the team needs

What the JD emphasized

  • teach models how to interface with the top professional software using code
  • train agents to use code, APIs, tools, and structured integrations to operate across applications
  • enable models to take useful actions across a user’s digital context
  • turning connected tools into a powerful action surface for our agents
  • hands-on experience with LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, tool-using agents, or production ML systems

Other signals

  • training frontier agents
  • teaching models to interface with professional software using code
  • enabling models to take useful actions across a user’s digital context
  • turning connected tools into a powerful action surface for our agents