Senior Data Scientist - Shopping Experience (search)

Instacart Instacart · Consumer · United States · Remote · Data Science

Senior Data Scientist on the Shopping Experience team at Instacart, focusing on Search. This role owns the analytics and experimentation strategy for search relevance, ranking quality, and latency, partnering with Product, Engineering, and ML. Responsibilities include defining metrics, designing and analyzing A/B tests, building diagnostic analyses, connecting offline model evaluation with online metrics, and improving data quality for search. The role requires strong analytical skills, product sense, and communication, with experience in A/B testing, SQL, Python/R, and modern AI tooling. Preferred qualifications include experience in search relevance, recommendations, NLP, embeddings, and bridging offline/online evaluation.

What you'd actually do

  1. Own core Search metrics and funnels end to end (e.g., query impression engagement cart adds), including defining guardrails, monitoring performance across platforms and segments, and diagnosing conversion gaps.
  2. Design, run, and interpret experiments across ranking, retrieval, and search UX (e.g., relevance model changes, query understanding, result layouts), turning ambiguous or conflicting outcomes into crisp, data-driven recommendations.
  3. Partner with Product, Engineering, and ML to prioritize opportunities, size impact, and influence the roadmap for relevance, quality, and latency improvements that unlock measurable business outcomes.
  4. Build deep diagnostic analyses by query class, price point, surface, and customer lifecycle to pinpoint where and why Search underperforms and specify concrete changes that will move key outcomes.
  5. Connect offline model evaluation with online and business metrics by collaborating with ML partners on evaluation design, ensuring model changes reliably improve end-user experience—not just offline scores.
  6. Improve data quality, instrumentation, and metric definitions for Search so that teams can reason about performance with clarity, consistency, and speed.

Skills

Required

  • Advanced SQL proficiency, including complex joins and window functions, working with large-scale datasets in modern data warehouses (e.g., Snowflake, BigQuery, Redshift).
  • Proficiency in Python or R for analysis, experimentation, and modeling.
  • Hands-on experience designing and analyzing A/B tests end to end, including metric selection, power and sample sizing, covariate adjustment, and decision-making under uncertainty.
  • Demonstrated ability to define success metrics, decompose ambiguous product problems, and deliver clear, opinionated recommendations to Product and Engineering partners.
  • Excellent written and verbal communication skills; able to tailor complex analyses for both technical and non-technical audiences.
  • Bachelor’s degree in a quantitative field (e.g., Statistics, Computer Science, Mathematics, Economics, Engineering) or equivalent practical experience.
  • Comfort using modern AI tooling (e.g., Claude, code assistants, PromptQL) to accelerate analysis, experimentation, and communication while exercising strong judgment on quality and reliability.

Nice to have

  • Experience in search relevance, ranking, recommendations, personalization, or information retrieval (e.g., e-commerce or marketplace search).
  • Familiarity with NLP, embeddings, and semantic search, including how to evaluate and iterate on these techniques in production.
  • Experience bridging offline evaluation metrics (e.g., NDCG, precision/recall, human evaluation) with online experiments and business outcomes.
  • Background in causal inference beyond standard A/B tests (e.g., holdouts, diff-in-diff, quasi-experiments) to measure long-term or cross-surface effects.
  • Comfort working across web and native app surfaces, navigating tradeoffs between relevance, monetization, and latency.
  • Proven impact improving logging, instrumentation, and metric definitions in complex data environments.

What the JD emphasized

  • Own core Search metrics and funnels end to end
  • defining guardrails
  • diagnosing conversion gaps
  • Design, run, and interpret experiments
  • turning ambiguous or conflicting outcomes into crisp, data-driven recommendations
  • Partner with Product, Engineering, and ML
  • influence the roadmap
  • unlock measurable business outcomes
  • Build deep diagnostic analyses
  • specify concrete changes
  • Connect offline model evaluation with online and business metrics
  • ensuring model changes reliably improve end-user experience
  • Improve data quality, instrumentation, and metric definitions
  • reason about performance with clarity, consistency, and speed
  • rigorous analytics
  • strong product sense
  • clear communication
  • drive decisive action
  • rolling up your sleeves
  • collaborating across disciplines
  • using experimentation to uncover what truly helps customers find the right items quickly
  • Advanced SQL proficiency
  • complex joins and window functions
  • working with large-scale datasets
  • modern data warehouses
  • Proficiency in Python or R
  • Hands-on experience designing and analyzing A/B tests end to end
  • metric selection
  • power and sample sizing
  • covariate adjustment
  • decision-making under uncertainty
  • define success metrics
  • decompose ambiguous product problems
  • deliver clear, opinionated recommendations
  • Excellent written and verbal communication skills
  • tailor complex analyses for both technical and non-technical audiences
  • Bachelor’s degree in a quantitative field
  • equivalent practical experience
  • Comfort using modern AI tooling
  • exercise strong judgment on quality and reliability
  • Experience in search relevance, ranking, recommendations, personalization, or information retrieval
  • e-commerce or marketplace search
  • Familiarity with NLP, embeddings, and semantic search
  • how to evaluate and iterate on these techniques in production
  • Experience bridging offline evaluation metrics
  • online experiments and business outcomes
  • Background in causal inference beyond standard A/B tests
  • measure long-term or cross-surface effects
  • Comfort working across web and native app surfaces
  • navigating tradeoffs between relevance, monetization, and latency
  • Proven impact improving logging, instrumentation, and metric definitions
  • complex data environments

Other signals

  • own the analytics and experimentation strategy that powers how we interpret customer intent and connect it to the most relevant items and retailers
  • shape the roadmap for search relevance, ranking quality, and latency
  • translate complex, noisy signals into clear insights and recommendations that move the metrics that matter—search conversion, order rate, and GTV
  • partner with Product, Engineering, and ML to prioritize opportunities, size impact, and influence the roadmap for relevance, quality, and latency improvements