What you'd actually do

Design and execute system-level measurement frameworks for foundational model improvements spanning offline evaluation benchmarks, online A/B experiments, and longitudinal impact tracking across surfaces.

Define, and own the success metrics that quantify foundational model value.

Build causal inference methodologies to isolate the incremental impact of individual model components within a complex, multi-model production system where changes co-occur and interact.

Work cross-functionally to build relationships, proactively communicate key findings, and collaborate closely with ML Engineers, Applied Scientists, Homefeed and Surface teams to ensure measurement rigor is embedded in every model launch.

Relentlessly focus on impact, whether through sharpening investment decisions with data, raising the bar for launch criteria, accelerating experimentation velocity, or surfacing hidden inefficiencies in the model ecosystem.

Skills

Required

5+ years of experience analyzing data in a fast-paced, data-driven environment with proven ability to apply scientific methods to solve real-world problems on web-scale data.
Strong interest and hands-on experience in one or more of: ML system evaluation, recommender system measurement, A/B experimentation at scale, causal inference
Deep familiarity with large-scale recommendation or ranking systems and their evaluation including an understanding of how representation learning, retrieval, ranking, and re-ranking stages interact and compound in production.
Experience designing and executing A/B experiments for complex ML systems, including multi-surface holdouts, metric decomposition, long-run effect estimation, and interference/spillover mitigation.
Strong quantitative programming (Python) and data manipulation skills (SQL/Spark); experience with ML pipelines, feature stores, and large-scale experimentation platforms.
Ability to work independently, drive ambiguous projects end-to-end, and operate with high ownership in a fast-moving research-to-production environment.
Excellent written and verbal communication skills, with the ability to translate complex system-level findings into clear narratives for technical and non-technical partners including leadership-level investment recommendations.

Nice to have

Team player eager to partner across teams to turn measurement insights into better models and faster launches.

What the JD emphasized

system-level measurement frameworks

foundational model improvements

offline evaluation benchmarks

online A/B experiments

longitudinal impact tracking

success metrics

foundational model value

causal inference methodologies

incremental impact

complex, multi-model production system

measurement rigor

model launch

investment decisions

launch criteria

experimentation velocity

model ecosystem

Other signals

measurement backbone for foundational model initiatives

design evaluation frameworks, experimentation strategies, and system-level metrics

quantify how improvements to our largest and most complex models translate into real user and business outcomes

build causal inference methodologies to isolate the incremental impact of individual model components

About Pinterest:

Millions of people around the world come to our platform to find creative ideas, dream about new possibilities and plan for memories that will last a lifetime. At Pinterest, we’re on a mission to bring everyone the inspiration to create a life they love, and that starts with the people behind the product.

Discover a career where you ignite innovation for millions, transform passion into growth opportunities, celebrate each other’s unique experiences and embrace the flexibility to do your best work. Creating a career you love? It’s Possible.

At Pinterest, AI isn't just a feature, it's a powerful partner that augments our creativity and amplifies our impact, and we’re looking for candidates who are excited to be a part of that. To get a complete picture of your experience and abilities, we’ll explore your foundational skills and how you collaborate with AI.

Through our interview process, what matters most is that you can always explain your approach, showing us not just what you know, but how you think. You can read more about our AI interview philosophy and how we use AI in our recruiting process here.

Pinterest is the world's leading visual search and discovery platform, serving over 500 million monthly active users globally on their journey from inspiration to action. As we scale and unify our foundational AI models across every recommendation surface, rigorously measuring the system-level impact of these advances is essential to our ability to ship with confidence and compound gains over time.

We are looking for a Senior Data Scientist to serve as the measurement backbone for Pinterest's foundational model initiatives within the Advanced Technology Group (ATG). In this role, you will design the evaluation frameworks, experimentation strategies, and system-level metrics that quantify how improvements to our largest and most complex models translate into real user and business outcomes. You will work in a highly collaborative and cross-functional environment, partnering with ML Engineers, Applied Scientists, Product Managers, and ML Platform engineers.

You are expected to develop a deep understanding of Pinterest's recommendation ecosystem, from representation learning to retrieval, ranking, reinforcement learning, and beyond, and bring the analytical rigor necessary to decompose, attribute, and communicate the impact of foundational model changes across this interconnected system. The results of your work will directly influence model investment decisions, launch criteria, and the pace of innovation across ATG and Pinterest.

What You'll Do

Design and execute system-level measurement frameworks for foundational model improvements spanning offline evaluation benchmarks, online A/B experiments, and longitudinal impact tracking across surfaces.
Define, and own the success metrics that quantify foundational model value.
Build causal inference methodologies to isolate the incremental impact of individual model components within a complex, multi-model production system where changes co-occur and interact.
Work cross-functionally to build relationships, proactively communicate key findings, and collaborate closely with ML Engineers, Applied Scientists, Homefeed and Surface teams to ensure measurement rigor is embedded in every model launch.
Relentlessly focus on impact, whether through sharpening investment decisions with data, raising the bar for launch criteria, accelerating experimentation velocity, or surfacing hidden inefficiencies in the model ecosystem.

What We're Looking For

5+ years of experience analyzing data in a fast-paced, data-driven environment with proven ability to apply scientific methods to solve real-world problems on web-scale data.
Strong interest and hands-on experience in one or more of: ML system evaluation, recommender system measurement, A/B experimentation at scale, causal inference
Deep familiarity with large-scale recommendation or ranking systems and their evaluation including an understanding of how representation learning, retrieval, ranking, and re-ranking stages interact and compound in production.
Experience designing and executing A/B experiments for complex ML systems, including multi-surface holdouts, metric decomposition, long-run effect estimation, and interference/spillover mitigation.
Strong quantitative programming (Python) and data manipulation skills (SQL/Spark); experience with ML pipelines, feature stores, and large-scale experimentation platforms.
Ability to work independently, drive ambiguous projects end-to-end, and operate with high ownership in a fast-moving research-to-production environment.
Excellent written and verbal communication skills, with the ability to translate complex system-level findings into clear narratives for technical and non-technical partners including leadership-level investment recommendations.
A team player eager to partner across teams to turn measurement insights into better models and faster launches.

This position is not eligible for relocation assistance.

#LI-NM4

At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. The position is also eligible for equity. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise.

Information regarding the culture at Pinterest and benefits available for this position can be found here.

US based applicants only

$139,764—$287,749 USD

Our Commitment to Inclusion:

Pinterest is an equal opportunity employer and makes employment decisions on the basis of merit. We want to have the best qualified people in every job. All qualified applicants will receive consideration for employment without regard to race, color, ancestry, national origin, religion or religious creed, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, age, marital status, status as a protected veteran, physical or mental disability, medical condition, genetic information or characteristics (or those of a family member) or any other consideration made unlawful by applicable federal, state or local laws. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you require a medical or religious accommodation during the job application process, please complete this form for support.