Senior Product Operations Manager, Evaluation

Harvey Harvey · AI Frontier · San Francisco, CA · Product

This role builds and scales the evaluation engine for Harvey's AI platform, focusing on operationalizing evaluation methodologies, managing data providers, and ensuring model accuracy, reliability, and trust for global product launches. It requires strong technical, systems, and operational skills to embed evaluation as a core product capability.

What you'd actually do

  1. Build and scale the systems that power model and product evaluations across Harvey
  2. Run intake, triage, and prioritization for the evaluation request queue, routing capacity to the highest-value coverage gaps
  3. Embed evaluation workflows and readiness checkpoints into the product development lifecycle
  4. Create the single source of truth for evaluation status, results, history, and launch readiness
  5. Turn Expert-designed evaluation methodologies into scalable, repeatable operational processes

Skills

Required

  • Technical program management
  • Product operations
  • Research operations
  • Evaluation/benchmarking roles
  • ML/AI evaluations
  • Benchmarking frameworks
  • Scientific workflows
  • Statistical methodologies
  • SQL
  • Python
  • Business acumen
  • ROI-focused mindset
  • Cross-functional coordination
  • Attention to detail
  • Clarity, rigor, and reproducibility
  • Navigating evolving landscapes
  • Communication skills
  • Translating technical nuance

Nice to have

  • AI tool support

What the JD emphasized

  • mission-critical
  • evaluation complexity is increasing 10x
  • high-ownership role
  • thrives in ambiguity
  • loves building structure
  • scale the evaluation infrastructure
  • 4–7+ years in technical program management, product operations, research operations, or evaluation/benchmarking roles
  • Experience working with ML/AI evaluations, benchmarking frameworks, or scientific workflows
  • Comfort with statistical methodologies and SQL or Python
  • Ability to work deeply with legal experts and operationalize complex evaluation methodologies
  • High attention to detail and a bias toward clarity, rigor, and reproducibility
  • Ability to navigate an evolving landscape and bring order to complex systems
  • Desire to do whatever it takes to make evaluation systems successful

Other signals

  • evaluation engine
  • model behavior reliably, accurately, and jurisdictionally correctly
  • operationalize evaluation methodologies
  • scale the evaluation infrastructure