What you'd actually do

Design and operationalize rigorous evaluation systems for either GenAI features (text, image, video, 3D, 4D) or internal AI Agents (Code Review, Refactor, Test Gen). This includes eval experiment design, dataset design, label reliability analysis, and implementing and finetuning LLM-as-judge methods.

Conduct online experiments (A/B tests) and causal inference to quantify the impact of GenAI features or AI-assisted coding tools. You will identify opportunities, measure lift, and ensure statistical rigor.

Partner with cross-functional teams to define leading/lagging indicators—whether for GenAI safety and user satisfaction, or for engineering productivity and code health.

Research and apply state-of-the-art methodologies to build reproducible evaluation tooling and agentic workflows that lift rigor and efficiency across the company.

Develop dashboards and reporting frameworks that reveal trends (e.g., model performance or developer friction) and translate complex data into clear, prioritized recommendations for leadership.

Skills

Required

SQL (Hive/Spark)
Python or R
Experimentation
Causal Inference
Statistical Analysis
Metric Design
GenAI Familiarity
Evaluation Methods

Nice to have

PhD or Master’s in Statistics, Economics, Computer Science, Applied Math, Physics, Engineering, or a related quantitative field
5+ years of experience in data science, analytics, or a quantitative role
Model training lifecycle (fine-tuning, RLHF, synthetic data generation)
Engineering development workflow
Engineering efficiency data
Applied research background
Publications in relevant technical fields

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers and creators.

At Roblox, we’re building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device.** **We’re on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there.

A career at Roblox means you’ll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone.

WHY DATA SCIENCE & ANALYTICS?

The Data Science & Analytics organization's mission is to increase our speed, frequency, and acumen in making decisions at scale by instilling a data-influenced approach to building products. We cover a wide area of the data spectrum, including analytical data engineering, product analytics, experimentation, causal inference, statistical modeling, and machine learning. Aligned and partnered with product verticals, we use this extensive tool belt to discover new opportunities and unmet use cases, influence and craft the product roadmap, and prioritize, build data products, and measure impact on our community of players and developers.

WHY GENERATIVE AI?

Our team’s mission is twofold: to enable Roblox Creators to bring GenAI capabilities to millions of users, and to empower Roblox Creators and our own engineers with AI-backed tools to deliver value faster. We drive this innovation with a core commitment to safety, responsibility, and quality.

As a Senior Data Scientist, you will play a critical role in a key area within our Foundation AI team:

Engineering Efficiency and Code Intelligence: Building the metrics, analytics, experimentation foundation, and AI workflow that powers how Roblox engineers and creators build and ship with AI and intelligent code systems.

Whether you are focused on the end-user experience or the developer ecosystem, you will define how we measure safety, responsibility, quality, and efficiency. You will combine annotation analysis, design of experiments, causal inference, model-based evaluation methods (such as LLM-as-a-judge), optimization algorithm, and AI models to drive product decisions and model improvements.

You Will:

Develop Evaluation Frameworks: Design and operationalize rigorous evaluation systems for either GenAI features (text, image, video, 3D, 4D) or internal AI Agents (Code Review, Refactor, Test Gen). This includes eval experiment design, dataset design, label reliability analysis, and implementing and finetuning LLM-as-judge methods.
Run Rigorous Experiments: Conduct online experiments (A/B tests) and causal inference to quantify the impact of GenAI features or AI-assisted coding tools. You will identify opportunities, measure lift, and ensure statistical rigor.
Define Success Metrics: Partner with cross-functional teams to define leading/lagging indicators—whether for GenAI safety and user satisfaction, or for engineering productivity and code health.
Build Automated Systems: Research and apply state-of-the-art methodologies to build reproducible evaluation tooling and agentic workflows that lift rigor and efficiency across the company.
Drive Strategy & Visibility: Develop dashboards and reporting frameworks that reveal trends (e.g., model performance or developer friction) and translate complex data into clear, prioritized recommendations for leadership.

You Have:

Advanced Degree: PhD or Master’s in Statistics, Economics, Computer Science, Applied Math, Physics, Engineering, or a related quantitative field.
Experience: 5+ years of experience in data science, analytics, or a quantitative role.
Technical Proficiency: Strong proficiency in SQL (Hive/Spark) for manipulating large datasets and scripting languages (Python or R) for analysis and modeling.
Experimentation and Causal Inference: A solid grounding in experimentation, causal inference, and statistical analysis, including test design and metric design for feature impact.
Problem Solving: A demonstrated track record of framing ambiguous problems, designing analytical approaches, and solving open-ended data science problems that drive business impact.
Learning Agility: Ability to effectively and responsibly use AI tools to enhance productivity and a passion for continuously improving methods in a fast-evolving field.
GenAI Familiarity: Familiarity with GenAI models and safety/quality evaluation methods. Expertise in the model training lifecycle is a plus (e.g., fine-tuning, RLHF, or synthetic data generation).
Engineering Development Workflow: Experience with engineering development workflows and engineering efficiency data is a plus for the Engineering Efficiency and Code Intelligence role.
Applied Research Background: A track record of applied research or publications in relevant technical fields is highly valued.

For roles that are based at our headquarters in San Mateo, CA: The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such as professional background, training, work experience, location, business needs and market demand. Therefore, in some circumstances, the actual salary could fall outside of this expected range. This pay range is subject to change and may be modified in the future. All full-time employees are also eligible for equity compensation and for benefits as described on this page.

Annual Salary Range

$221,380—$263,670 USD

Roles that are based in an office are onsite Tuesday, Wednesday, and Thursday, with optional presence on Monday and Friday (unless otherwise noted).

Roblox provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. Roblox also provides reasonable accommodations to candidates with qualifying disabilities or religious beliefs during the recruiting process.

For US based roles only, please note the Company may not be able to employ candidates for this role who have United States work authorization related to certain U.S. visa categories, or support future H-1B sponsorship at this time.

WHY DATA SCIENCE & ANALYTICS?

WHY GENERATIVE AI?

As a Senior Data Scientist, you will play a critical role in a key area within our Foundation AI team:

Engineering Efficiency and Code Intelligence: Building the metrics, analytics, experimentation foundation, and AI workflow that powers how Roblox engineers and creators build and ship with AI and intelligent code systems.

You Will:

Develop Evaluation Frameworks: Design and operationalize rigorous evaluation systems for either GenAI features (text, image, video, 3D, 4D) or internal AI Agents (Code Review, Refactor, Test Gen). This includes eval experiment design, dataset design, label reliability analysis, and implementing and finetuning LLM-as-judge methods.
Run Rigorous Experiments: Conduct online experiments (A/B tests) and causal inference to quantify the impact of GenAI features or AI-assisted coding tools. You will identify opportunities, measure lift, and ensure statistical rigor.
Define Success Metrics: Partner with cross-functional teams to define leading/lagging indicators—whether for GenAI safety and user satisfaction, or for engineering productivity and code health.
Build Automated Systems: Research and apply state-of-the-art methodologies to build reproducible evaluation tooling and agentic workflows that lift rigor and efficiency across the company.
Drive Strategy & Visibility: Develop dashboards and reporting frameworks that reveal trends (e.g., model performance or developer friction) and translate complex data into clear, prioritized recommendations for leadership.

You Have:

Advanced Degree: PhD or Master’s in Statistics, Economics, Computer Science, Applied Math, Physics, Engineering, or a related quantitative field.
Experience: 5+ years of experience in data science, analytics, or a quantitative role.
Technical Proficiency: Strong proficiency in SQL (Hive/Spark) for manipulating large datasets and scripting languages (Python or R) for analysis and modeling.
Experimentation and Causal Inference: A solid grounding in experimentation, causal inference, and statistical analysis, including test design and metric design for feature impact.
Problem Solving: A demonstrated track record of framing ambiguous problems, designing analytical approaches, and solving open-ended data science problems that drive business impact.
Learning Agility: Ability to effectively and responsibly use AI tools to enhance productivity and a passion for continuously improving methods in a fast-evolving field.
GenAI Familiarity: Familiarity with GenAI models and safety/quality evaluation methods. Expertise in the model training lifecycle is a plus (e.g., fine-tuning, RLHF, or synthetic data generation).
Engineering Development Workflow: Experience with engineering development workflows and engineering efficiency data is a plus for the Engineering Efficiency and Code Intelligence role.
Applied Research Background: A track record of applied research or publications in relevant technical fields is highly valued.

Annual Salary Range

$221,380—$263,670 USD

Roles that are based in an office are onsite Tuesday, Wednesday, and Thursday, with optional presence on Monday and Friday (unless otherwise noted).

Senior Data Scientist - Generative AI

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

You Will:

You Have:

You Will:

You Have: