Applied AI Engineer, Codex Core Agent

OpenAI OpenAI · AI Frontier · San Francisco, CA · Applied AI

This role focuses on improving the performance, reliability, and usefulness of AI agents, specifically for software engineering tasks. It involves designing agent behaviors, developing evaluation metrics, optimizing through prompting and tool-use, analyzing production failures, and building feedback loops to enhance models and agent capabilities. The goal is to bridge the gap between research potential and real-world application, ensuring agents are dependable tools.

What you'd actually do

  1. Design and iterate on agent behaviors across real-world coding tasks and long-horizon workflows.
  2. Work closely with research to develop and run evals to measure agent performance, regressions, failure modes, and edge cases.
  3. Improve performance through prompting, tool-use strategies, context construction, and model-facing experimentation.
  4. Analyze failures in production and systematically improve robustness and reliability.
  5. Build feedback loops and data systems that get better real-task data into evaluation and research.

Skills

Required

  • Python
  • modern ML tooling
  • model evaluation
  • fine-tuning
  • prompt design
  • systems thinking
  • debugging real-world failures

Nice to have

  • agent frameworks
  • tool-using LLM systems
  • code generation models
  • developer tooling
  • large, messy datasets
  • production logs

What the JD emphasized

  • agent behaviors
  • agent performance
  • agentic systems
  • agent frameworks
  • tool-using LLM systems
  • code generation models
  • developer tooling

Other signals

  • agentic systems
  • LLM performance
  • production deployment