Research Engineer, Frontier Evals & Environments - Finance

OpenAI OpenAI · AI Frontier · San Francisco, CA · Research

OpenAI is seeking a Research Engineer for the Frontier Evals team, focusing on building evaluations for AI models in the finance domain. The role involves identifying crucial financial capabilities, designing quantification methods, owning a research agenda for evaluation development, and refining frontier model assessments. This position is critical for steering AI progress towards safe AGI/ASI.

What you'd actually do

  1. Identify important model capabilities, skills, and behaviors that are crucial to financial workflows, and design methods to quantify performance in these areas
  2. Own and pursue a research agenda to identify an important model capability (especially as it relates to financial reasoning) and build evals to measure it
  3. Continuously refine evaluations of frontier AI models to assess the extent of frontier capabilities

Skills

Required

  • strong engineering skills
  • statistical analysis skills
  • passionate about evals for real world applications and knowledge work
  • detail-oriented and thorough
  • team player
  • willing to do a variety of tasks to move the team forward
  • passionate and knowledgeable about AGI/ASI measurement
  • able to operate effectively in a dynamic and extremely fast-paced research environment
  • scope and deliver projects end-to-end

Nice to have

  • An ability to work cross-functionally
  • Excellent communication skills

What the JD emphasized

  • own individual threads within this endeavor end-to-end
  • build evals to measure it

Other signals

  • builds north star model evaluations
  • drive progress towards safe AGI/ASI
  • measure and steer our models
  • creates self-improvement loops to steer our training, safety, and launch decisions
  • own individual threads within this endeavor end-to-end
  • identify important model capabilities, skills, and behaviors that are crucial to financial workflows, and design methods to quantify performance in these areas
  • own and pursue a research agenda to identify an important model capability (especially as it relates to financial reasoning) and build evals to measure it
  • continuously refine evaluations of frontier AI models to assess the extent of frontier capabilities