Principal Data Scientist - Agent Builder

Elastic Elastic · Enterprise · Netherlands · Enterprise Search - Workchat

Principal Data Scientist to build and evaluate Elastic's conversational and agentic platform, focusing on RAG, agents, tools, and quality metrics. The role involves defining evaluation strategies, leading quality metric design, building retrieval approaches, influencing product decisions, and partnering with engineering for productionization.

What you'd actually do

  1. Define the evaluation strategy for conversational and agentic search, including offline and online evaluation, golden datasets, rubrics, LLM-as-judge calibration, groundedness and citation checks, and A/B testing.
  2. Lead the design of quality metrics and decision frameworks for RAG, agents, tools, model selection, agent routing, prompt behavior, and cost/latency trade-offs.
  3. Build, compare, and guide improvements across retrieval and re-ranking approaches, including sparse and dense retrieval, vector search, query understanding, semantic rewrites, and context enrichment.
  4. Turn experimental results into product and business decisions: which models to use, how to route requests efficiently, which tools should be exposed, and how agents should be customized for different Elastic use cases.
  5. Partner with engineering to productionize evaluation pipelines, telemetry, dashboards, CI guardrails, and regression detection for chat quality, helpfulness, dedication, latency, and cost.

Skills

Required

  • 8+ years of applied DS/ML experience
  • deep expertise in IR, NLP, ranking, semantic search, RAG, or LLM-powered product experiences
  • Strong track record defining and leading evaluation for production AI/ML systems
  • Experience influencing product and technical strategy through data
  • Hands-on ability with Python, PyTorch/Transformers, Pandas, notebooks, reproducible experiments, versioned datasets, and clean, reviewable code
  • Strong understanding of retrieval systems
  • Experience collaborating closely with engineering teams to move from prototype to production
  • Excellent written and verbal communication

Nice to have

  • Practical Elasticsearch experience, or experience with similar search and distributed data systems
  • ES|QL familiarity is a plus
  • A collaborative, low-ego style and a strong ability to mentor, raise standards, and develop transparency for others in a distributed team

What the JD emphasized

  • Define the evaluation strategy
  • evaluation strategy
  • evaluation for production AI/ML systems
  • evaluation methodology
  • evaluation metrics

Other signals

  • building agentic platform
  • evaluating and improving chat quality
  • defining evaluation strategy for RAG and agents
  • influencing roadmap and shipping improvements