Director, Research - Evaluation & Training

Snorkel AI Snorkel AI · Data AI · San Francisco, CA · 316 - Research

Snorkel AI is seeking a Director of Research to lead a team focused on data evaluation, error analysis, and data valuation methods to predict model performance. The role involves owning a roadmap for novel evaluation techniques, synthesizing trends from model failures, and quantifying the impact of Snorkel's data on model performance. This leadership position requires a strong understanding of the AI/ML landscape, team management experience, and the ability to translate technical findings into business outcomes.

What you'd actually do

  1. Own a multi-quarter roadmap centered on novel evaluation, error analysis, and data valuation techniques
  2. Synthesize and share trends from model-failure analysis and benchmarking into recommendations on the datasets the community should focus on and the ones Snorkel should invest in — making this team a primary input to the company's data strategy.
  3. Focus on data valuation techniques that quantify how Snorkel data meaningfully improves model performance
  4. Lead and grow a team of researchers, setting a high bar for quality, rigor and speed of execution
  5. Act as the primary bridge between the team's findings and Product, GTM, and our customers

Skills

Required

  • Applied AI/ML research
  • Technical team management
  • LLM evaluation
  • Benchmarking
  • Model behavior analysis
  • Data strategy
  • Communication and storytelling

Nice to have

  • Data valuation research
  • Data attribution research
  • Experience working with frontier labs
  • Experience with public benchmarks
  • Experience with commercial AI data/eval products

What the JD emphasized

  • 7+ years in applied AI, ML, or research roles, with 4+ years managing technical teams
  • Strong business and market judgment in the AI/ML space
  • Technically conversant and credible: enough depth in LLM evaluation, benchmarking, and model behavior analysis to set direction, judge experimental quality, and pressure-test results
  • A nose for trends: able to look across many evaluation results and failure cases and extract the signal that should drive what gets built next.

Other signals

  • data-centric AI
  • data evaluation
  • error analysis
  • data valuation
  • benchmarking
  • model performance prediction