Senior Data Scientist - Generative AI

Roblox Roblox · Consumer · San Mateo, CA · Data Science

Senior Data Scientist role focused on Generative AI at Roblox, specifically on developing evaluation frameworks and automated systems for GenAI features and internal AI Agents. The role involves running experiments, defining success metrics, and driving strategy related to AI safety, quality, and efficiency for both end-users and developers. It requires strong data science skills, experimentation, causal inference, and familiarity with GenAI evaluation methods.

What you'd actually do

  1. Develop Evaluation Frameworks: Design and operationalize rigorous evaluation systems for either GenAI features (text, image, video, 3D, 4D) or internal AI Agents (Code Review, Refactor, Test Gen). This includes eval experiment design, dataset design, label reliability analysis, and implementing and finetuning LLM-as-judge methods.
  2. Run Rigorous Experiments: Conduct online experiments (A/B tests) and causal inference to quantify the impact of GenAI features or AI-assisted coding tools. You will identify opportunities, measure lift, and ensure statistical rigor.
  3. Define Success Metrics: Partner with cross-functional teams to define leading/lagging indicators—whether for GenAI safety and user satisfaction, or for engineering productivity and code health.
  4. Build Automated Systems: Research and apply state-of-the-art methodologies to build reproducible evaluation tooling and agentic workflows that lift rigor and efficiency across the company.
  5. Drive Strategy & Visibility: Develop dashboards and reporting frameworks that reveal trends (e.g., model performance or developer friction) and translate complex data into clear, prioritized recommendations for leadership.

Skills

Required

  • Advanced Degree: PhD or Master’s in Statistics, Economics, Computer Science, Applied Math, Physics, Engineering, or a related quantitative field.
  • Experience: 5+ years of experience in data science, analytics, or a quantitative role.
  • Technical Proficiency: Strong proficiency in SQL (Hive/Spark) for manipulating large datasets and scripting languages (Python or R) for analysis and modeling.
  • Experimentation and Causal Inference: A solid grounding in experimentation, causal inference, and statistical analysis, including test design and metric design for feature impact.
  • Problem Solving: A demonstrated track record of framing ambiguous problems, designing analytical approaches, and solving open-ended data science problems that drive business impact.
  • GenAI Familiarity: Familiarity with GenAI models and safety/quality evaluation methods.

Nice to have

  • Learning Agility: Ability to effectively and responsibly use AI tools to enhance productivity and a passion for continuously improving methods in a fast-evolving field.
  • Expertise in the model training lifecycle is a plus (e.g., fine-tuning, RLHF, or synthetic data generation).
  • Experience with engineering development workflows and engineering efficiency data is a plus for the Engineering Efficiency and Code Intelligence role.
  • Applied Research Background: A track record of applied research or publications in relevant technical fields is highly valued.

What the JD emphasized

  • rigorous evaluation systems
  • LLM-as-judge methods
  • GenAI features
  • AI Agents
  • AI-assisted coding tools
  • GenAI safety
  • engineering productivity
  • evaluation tooling
  • agentic workflows
  • model performance
  • developer friction
  • GenAI Familiarity
  • model training lifecycle
  • fine-tuning
  • RLHF
  • synthetic data generation
  • engineering development workflow
  • engineering efficiency data
  • Applied Research Background

Other signals

  • Develop Evaluation Frameworks
  • Run Rigorous Experiments
  • Build Automated Systems
  • GenAI Familiarity
  • Applied Research Background