Sr. Data Scientist, Ai/ml Systems

Pinterest Pinterest · Consumer · San Francisco, CA · ATG

Senior Data Scientist role focused on building the measurement and evaluation frameworks for foundational AI models at Pinterest. This role will design system-level metrics, experimentation strategies, and causal inference methodologies to quantify the impact of model improvements on user and business outcomes, directly influencing investment decisions and launch criteria. The role requires deep understanding of recommendation systems and large-scale ML evaluation.

What you'd actually do

  1. Design and execute system-level measurement frameworks for foundational model improvements spanning offline evaluation benchmarks, online A/B experiments, and longitudinal impact tracking across surfaces.
  2. Define, and own the success metrics that quantify foundational model value.
  3. Build causal inference methodologies to isolate the incremental impact of individual model components within a complex, multi-model production system where changes co-occur and interact.
  4. Work cross-functionally to build relationships, proactively communicate key findings, and collaborate closely with ML Engineers, Applied Scientists, Homefeed and Surface teams to ensure measurement rigor is embedded in every model launch.
  5. Relentlessly focus on impact, whether through sharpening investment decisions with data, raising the bar for launch criteria, accelerating experimentation velocity, or surfacing hidden inefficiencies in the model ecosystem.

Skills

Required

  • 5+ years of experience analyzing data in a fast-paced, data-driven environment with proven ability to apply scientific methods to solve real-world problems on web-scale data.
  • Strong interest and hands-on experience in one or more of: ML system evaluation, recommender system measurement, A/B experimentation at scale, causal inference
  • Deep familiarity with large-scale recommendation or ranking systems and their evaluation including an understanding of how representation learning, retrieval, ranking, and re-ranking stages interact and compound in production.
  • Experience designing and executing A/B experiments for complex ML systems, including multi-surface holdouts, metric decomposition, long-run effect estimation, and interference/spillover mitigation.
  • Strong quantitative programming (Python) and data manipulation skills (SQL/Spark); experience with ML pipelines, feature stores, and large-scale experimentation platforms.
  • Ability to work independently, drive ambiguous projects end-to-end, and operate with high ownership in a fast-moving research-to-production environment.
  • Excellent written and verbal communication skills, with the ability to translate complex system-level findings into clear narratives for technical and non-technical partners including leadership-level investment recommendations.

Nice to have

  • Team player eager to partner across teams to turn measurement insights into better models and faster launches.

What the JD emphasized

  • system-level measurement frameworks
  • foundational model improvements
  • offline evaluation benchmarks
  • online A/B experiments
  • longitudinal impact tracking
  • success metrics
  • foundational model value
  • causal inference methodologies
  • incremental impact
  • complex, multi-model production system
  • measurement rigor
  • model launch
  • investment decisions
  • launch criteria
  • experimentation velocity
  • model ecosystem

Other signals

  • measurement backbone for foundational model initiatives
  • design evaluation frameworks, experimentation strategies, and system-level metrics
  • quantify how improvements to our largest and most complex models translate into real user and business outcomes
  • build causal inference methodologies to isolate the incremental impact of individual model components