What you'd actually do

Own the evaluation strategy for Dashboards, as well as sister teams within our organization. Define the metrics — offline and online, quality and cost, single-turn and multi-turn — that the team and the broader organization optimize against.

Build the eval datasets, golden traces, and regression harnesses that catch quality changes before they hit customers, and make those assets reusable by every team that is building dashboards and widgets through agents

Drive measurable improvements to retrieval relevance, tool-selection accuracy, and context efficiency, partnering closely with the engineers on the team.

Provide technical leadership across the Dashboards team and the broader organization through design reviews, working groups, and mentorship.

Skills

Required

BS/MS/PhD in a scientific field, or equivalent experience
10+ years of relevant engineering or applied science experience, including time as a technical lead
Proven track record of leading ML or GenAI initiatives in a product-driven environment, from research through production
Significant experience with evaluation, experimentation, or measurement of ML systems at scale
Strong product mindset
Comfortable driving initiatives across cross-functional teams
Ability to make sound technical calls when the path isn’t yet defined

Nice to have

Experience with Datadog products
Experience in observability
Experience with hybrid workplace environments

Other signals

Defining and guaranteeing the quality of an AI system at scale

Evaluating agent end-to-end with non-deterministic trajectories

Scoring tool selection accuracy against numerous data sources and visualizations

Building measurement systems for regressions across all widget types and data sources

Driving improvements in retrieval relevance, tool-selection accuracy, and context efficiency

The Dashboards product is Datadog's unified single-pane-of-glass for metrics, logs, and traces—a comprehensive treasure trove of observability data. We are transforming Dashboards into an AI-native control surface and the central hub where every team moves seamlessly from question to insight to action – providing a guided experience that feels like having an expert SRE at your side and ensuring the entry point is never an empty canvas.

We're hiring a Staff Applied Scientist to define and guarantee the quality of this AI system at scale. "Good" isn't one number — it spans answer quality, tool-selection accuracy (critical given the growing catalog of data sources and visualizations), retrieval relevance, latency, token cost, and end-to-end agent success.

The space is full of open questions. How do you evaluate an agent end-to-end when the trajectory is non-deterministic? How do you score tool selection when a user’s query can result in the agent making decisions against dozens of visualizations and data sources – both of which are growing month over month? How do you build a measurement system that catches regressions across all widget types and data sources (e.g., enforcing correct grouping, sorting, and time overrides), and is easy to use and extend by dozens of teams? If those are the problems you want to spend your time on, come build this with us.

At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them.

** What You’ll Do:**

Own the evaluation strategy for Dashboards, as well as sister teams within our organization. Define the metrics — offline and online, quality and cost, single-turn and multi-turn — that the team and the broader organization optimize against.
Build the eval datasets, golden traces, and regression harnesses that catch quality changes before they hit customers, and make those assets reusable by every team that is building dashboards and widgets through agents
Drive measurable improvements to retrieval relevance, tool-selection accuracy, and context efficiency, partnering closely with the engineers on the team.
Provide technical leadership across the Dashboards team and the broader organization through design reviews, working groups, and mentorship.

** Who You Are:**

You have a BS/MS/PhD in a scientific field, or equivalent experience.
10+ years of relevant engineering or applied science experience, including time as a technical lead.
Proven track record of leading ML or GenAI initiatives in a product-driven environment, from research through production.
Significant experience with evaluation, experimentation, or measurement of ML systems at scale.
You bring a strong product mindset and are comfortable driving initiatives across cross-functional teams.
You thrive in ambiguity and can make sound technical calls when the path isn’t yet defined.

Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That's okay. If you’re passionate about technology and want to grow your skills, we encourage you to apply.

** Benefits and Growth:**

New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
Continuous professional development, product training, and career pathing
An inclusive company culture, giving programs, and the ability to join our Community Guilds (Datadog employee resource groups)
Competitive global benefits and global Spring Health benefits for employees and dependents age 6+

Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.

#LI-Hybrid

Datadog offers a competitive salary and equity package, and may include variable compensation. Actual compensation is based on factors such as the candidate's skills, qualifications, and experience. In addition, Datadog offers a wide range of best in class, comprehensive and inclusive employee benefits for this role including healthcare, dental, parental planning, and mental health benefits, a 401(k) plan and match, paid time off, fitness reimbursements, and a discounted employee stock purchase plan.

The reasonably estimated yearly salary for this role at Datadog is:

$276,000—$345,000 USD

**About Datadog: **

Datadog is the leading observability and security platform for the AI era, providing businesses with unified visibility across the technology stack to manage complexity at scale. It brings applications, infrastructure, data, models, and security into one place, using AI to detect and resolve issues before they impact customers. Trusted globally by Fortune 500 companies and high-growth AI leaders, Datadog enables businesses to move faster with clarity and confidence. Learn more about #DatadogLife on Instagram, LinkedIn, and Datadog Learning Center.

Equal Opportunity at Datadog:

Datadog is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and other characteristics protected by law. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. Here are our Candidate Legal Notices for your reference.

Datadog endeavors to make our Careers Page accessible to all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please complete this form. This form is for accommodation requests only and cannot be used to inquire about the status of applications.

Privacy and AI Guidelines:

Any information you submit to Datadog as part of your application will be processed in accordance with Datadog’s Applicant and Candidate Privacy Notice. For information on our AI policy, please visit Interviewing at Datadog AI Guidelines.