Agent Post-training, Artifacts Research

OpenAI OpenAI · AI Frontier · San Francisco, CA · Research

This role focuses on training frontier AI agents to create polished, useful work products like documents, spreadsheets, and dashboards. It involves owning end-to-end improvements in the post-training stack, including RL, data pipelines, graders, reward signals, and evals. The role partners with product teams to translate user needs into model improvements and contributes to major model runs and product launches.

What you'd actually do

  1. Design and run experiments that improve agentic model behavior for complex software and plugins..
  2. Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis.
  3. Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions.
  4. Partner with Codex and ChatGPT product teams to understand what users need and translate product signal into model improvements.
  5. Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior.

Skills

Required

  • LLMs
  • RL
  • RLHF/RLAIF
  • post-training
  • evals
  • graders
  • synthetic data
  • model training
  • coding agents
  • tool-using agents
  • production ML systems

Nice to have

  • consulting
  • finance
  • marketing
  • operations
  • data science

What the JD emphasized

  • frontier agents
  • post-training stack
  • train frontier models
  • frontier products
  • frontier models
  • major model runs
  • production readiness
  • frontier models

Other signals

  • training frontier agents
  • post-training stack improvements
  • shipping improvements into products