Data Scientist, Codex

OpenAI OpenAI · AI Frontier · San Francisco, CA · Data Science

This role focuses on measuring and accelerating product-market fit for AI developer tools, specifically within the Codex product team at OpenAI. The Data Scientist will define developer productivity metrics, design and interpret A/B tests for new coding models and features, and analyze model performance across various languages and tasks. The insights generated will directly influence the product's direction and the broader software engineering industry.

What you'd actually do

  1. Embed with the Codex product team to discover opportunities that improve developer outcomes and growth
  2. Design and interpret A/B tests and staged rollouts of new coding models and product features
  3. Define and operationalize metrics such as suggestion acceptance, edit distance, compile/test pass rates, task completion, latency, and session productivity
  4. Build dashboards and analyses that help the team self-serve answers to product questions (by language, framework, repo size, task type)
  5. Diagnose failure modes and partner with Research on targeted improvements (model quality signals, user feedback, evals)

Skills

Required

  • 5+ years in a quantitative role at a developer-facing or high-growth product
  • Fluency in SQL and Python
  • comfort with experiment design and causal inference
  • Experience defining product metrics tied to user value
  • Ability to communicate clearly with PM, Eng, and Design—and to influence product direction

Nice to have

  • Strong programming background; ability to prototype, run simulations, and reason about code quality
  • Familiarity with IDE/extension telemetry or developer tooling analytics
  • Prior experience with NLP/LLMs, code models, or evaluations for generative coding

What the JD emphasized

  • new coding models

Other signals

  • measure and accelerate product-market fit for AI developer tools
  • define what “developer productivity” means for our product
  • run experiments on new coding models and UX
  • pinpoint where the model helps or hurts across languages and tasks
  • insights will directly shape how an entire industry builds software