Senior Data Scientist - Generative AI

Roblox Roblox · Consumer · San Mateo, CA · Data Science

Senior Data Scientist role focused on Generative AI within Roblox's Foundation AI team. The role involves developing evaluation frameworks for GenAI features and internal AI Agents, running experiments and causal inference to measure impact, defining success metrics, building automated evaluation tooling and agentic workflows, and driving strategy through data analysis and reporting. The position requires expertise in experimentation, causal inference, SQL, Python/R, and familiarity with GenAI models and evaluation methods, with a strong emphasis on building and operationalizing rigorous evaluation systems.

What you'd actually do

  1. Design and operationalize rigorous evaluation systems for either GenAI features (text, image, video, 3D, 4D) or internal AI Agents (Code Review, Refactor, Test Gen). This includes eval experiment design, dataset design, label reliability analysis, and implementing and finetuning LLM-as-judge methods.
  2. Conduct online experiments (A/B tests) and causal inference to quantify the impact of GenAI features or AI-assisted coding tools. You will identify opportunities, measure lift, and ensure statistical rigor.
  3. Partner with cross-functional teams to define leading/lagging indicators—whether for GenAI safety and user satisfaction, or for engineering productivity and code health.
  4. Research and apply state-of-the-art methodologies to build reproducible evaluation tooling and agentic workflows that lift rigor and efficiency across the company.
  5. Develop dashboards and reporting frameworks that reveal trends (e.g., model performance or developer friction) and translate complex data into clear, prioritized recommendations for leadership.

Skills

Required

  • SQL (Hive/Spark)
  • Python or R
  • Experimentation
  • Causal Inference
  • Statistical Analysis
  • Metric Design
  • GenAI Familiarity
  • Evaluation Methods

Nice to have

  • PhD or Master’s in Statistics, Economics, Computer Science, Applied Math, Physics, Engineering, or a related quantitative field
  • 5+ years of experience in data science, analytics, or a quantitative role
  • Model training lifecycle (fine-tuning, RLHF, synthetic data generation)
  • Engineering development workflow
  • Engineering efficiency data
  • Applied research background
  • Publications in relevant technical fields

What the JD emphasized

  • rigorous evaluation systems
  • LLM-as-judge methods
  • GenAI features
  • AI Agents
  • AI-assisted coding tools
  • GenAI safety
  • engineering productivity
  • agentic workflows
  • model performance
  • developer friction

Other signals

  • Develop Evaluation Frameworks
  • Run Rigorous Experiments
  • Build Automated Systems
  • GenAI Familiarity
  • Applied Research Background