Data Scientist, Integrity Measurement

OpenAI OpenAI · AI Frontier · London, United Kingdom · Data Science

Data Scientist focused on measuring and responding to adversarial threats and misuse on OpenAI's platforms, developing AI-first methods for prevalence estimation and productionised safety metrics, and optimizing LLM prompts for measurement. This role involves owning measurement and metrics for severe harm verticals, informing improvements to detection and enforcement, and leveraging agentic products for automation.

What you'd actually do

  1. own measurement and quantitative analysis for a group of severe, actor- and network-based usage harm verticals
  2. develop and implement AI-first methods for prevalence measurement and other productionised safety metrics, which may necessarily include off-platform indicators or other non-standard datasets
  3. build metrics that can be used for goaling or A/B tests when prevalence or other top line metrics are not suitable
  4. own dashboards and metrics reporting for harm verticals
  5. conduct analyses and generate insights that inform improvements to review, detection, or enforcement, and that influence roadmaps

Skills

Required

  • trust and safety experience
  • measurement direction
  • deep statistics skills
  • sampling methods
  • prevalence estimation
  • data programming languages (R or python, SQL)
  • AI harms or leveraging AI for measurement

Nice to have

  • activity- rather than content-based prevalence estimation
  • experience with severe and sensitive harm areas like child safety or violence

What the JD emphasized

  • experienced trust and safety data scientists
  • drive measurement direction
  • deep statistics skills, specifically around sampling methods and prevalence estimation of complicated problem areas
  • experience working with severe and sensitive harm areas
  • AI harms or leveraging AI for measurement

Other signals

  • measurement for complex, actor- and sometimes network-level harms
  • AI-first methods for prevalence measurement
  • productionised safety metrics
  • optimise LLM prompts for the purpose of measurement
  • develop automation to scale yourself, leveraging our agentic products