Agent Post-training, Computer Use Research

OpenAI OpenAI · AI Frontier · San Francisco, CA · Research

This role focuses on training frontier AI agents to operate computers, including navigating browsers and desktops, using tools, reasoning through complex workflows, and collaborating with users and other agents. It involves designing and running experiments to improve agentic behavior, owning end-to-end improvements to the post-training stack (RL, data pipelines, graders, reward signals, evals), building evals and environments, and partnering with product teams to translate product signal into model improvements. The work directly shapes computer-use capabilities shipped in OpenAI's next generation of agents and sits at the intersection of frontier model training, product behavior, evaluation, and systems engineering.

What you'd actually do

  1. Design and run experiments that improve agentic model behavior for complex [computer use](https://openai.com/index/codex-for-almost-everything/), including desktop and browser.
  2. Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis.
  3. Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions.
  4. Partner with Codex and ChatGPT product teams to understand what users need and translate product signal into model improvements.
  5. Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior.

Skills

Required

  • Machine learning
  • Software engineering
  • Systems
  • Statistics
  • LLMs
  • RL
  • RLHF/RLAIF
  • Post-training
  • Evals
  • Graders
  • Synthetic data
  • Model training
  • Coding agents
  • Tool-using agents
  • Production ML systems
  • Experiment design
  • Data pipelines
  • Model behavior analysis
  • Product signal translation
  • Alignment interventions
  • Cross-functional collaboration

Nice to have

  • Computer use (desktop/browser navigation)
  • Multi-agent coordination
  • Long-horizon execution
  • Factuality
  • Instruction following
  • Calibrated reasoning
  • Taste
  • RL
  • RLHF/RLAIF
  • Evals
  • Graders
  • Synthetic data
  • Model training
  • Coding agents
  • Tool-using agents
  • Production ML systems
  • Open-ended problem solving
  • Research taste
  • Engineering execution
  • Product impact focus
  • Clear communication across disciplines

What the JD emphasized

  • frontier agents
  • computer use
  • post-training
  • agentic model behavior
  • product signal
  • model runs
  • agent behavior
  • model failures
  • training data
  • product fixes
  • research directions
  • product teams
  • model improvements
  • early-training
  • alignment interventions
  • data mixtures
  • objectives
  • synthetic data
  • eval loops
  • agent behavior
  • major model runs
  • model training
  • product infrastructure
  • production agent harness
  • multi-agent systems
  • production-like environments
  • shipped or near-shipped models
  • qualitative behavior
  • concrete hypotheses
  • experiments
  • fixes
  • technical fundamentals
  • machine learning
  • software engineering
  • systems
  • statistics
  • LLMs
  • RL
  • RLHF/RLAIF
  • post-training
  • evals
  • graders
  • synthetic data
  • model training
  • coding agents
  • tool-using agents
  • production ML systems
  • open-ended problems
  • research taste
  • engineering execution
  • product impact
  • model behavior
  • benchmark movement
  • agent useful
  • reliable
  • honest
  • tasteful
  • easy to work with
  • vague behavioral problem
  • concrete experiment
  • hypothesis
  • pipeline
  • model
  • result
  • research
  • product
  • infrastructure
  • data
  • evals
  • safety boundaries
  • load-bearing systems
  • processes
  • train and ship the models
  • agents genuinely useful
  • developers
  • enterprises
  • researchers
  • everyday users

Other signals

  • agent training
  • computer use
  • post-training
  • model capabilities
  • product impact