Which companies are hiring for Evals roles?

The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).

What AI lifecycle stage does Evals belong to?

Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).

What sectors invest most in Evals?

The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.

← Tag co-occurrence network

Evals

Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.

Primary AI lifecycle stage: evaluation.

As of today, 2,040 active AI roles across 208 companies in our index reference Evals. Hiring concentrates at the agents (57%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.

Top hiring:

Function

All Engineering · 2406 Research · 518 Product · 411

Status

All Active only

Sort

AI score Recently posted Company A–Z

FilteredsectorData AI×

206 AI roles tagged evals.

Company	Title	Sector	AI score	Other tags
Together AI	Research Intern, Model Shaping (Fall 2026)	Data AI	9	Fine-tuning · RL post-training · Frontier research · Model serving
Together AI	Frontier Agents Intern (Fall 2026)	Data AI	9	Agent orchestration · Agent research · Frontier research · RL post-training · Multimodal · Audio & speech · LLM observability
Snowflake	Post-Doctoral Researcher (Fixed-Term)	Data AI	9	Frontier research · Fine-tuning · RAG · Interpretability · Multimodal
Fireworks AI	AI Field Engineer - Enterprise	Data AI	9	Model serving · Inference infra · Fine-tuning
Fireworks AI	AI Field Engineer - Microsoft Foundry	Data AI	9	Model serving · Inference infra · Fine-tuning · Tool use · Agent orchestration
Fireworks AI	AI Field Engineer - AI Natives	Data AI	9	Inference infra · Model serving · Fine-tuning
ClickHouse	AI Product Engineer - ClickStack	Data AI	9	Agent orchestration · Tool use · LLM observability · RAG
ClickHouse	AI Product Engineer - ClickStack	Data AI	9	Agent orchestration · Tool use · LLM observability · RAG
ClickHouse	AI Product Engineer - ClickStack	Data AI	9	Agent orchestration · Tool use · LLM observability · RAG
ClickHouse	AI Product Engineer - ClickStack	Data AI	9	Agent orchestration · Tool use · LLM observability · RAG · Model serving
ClickHouse	AI Product Engineer - ClickStack	Data AI	9	Agent orchestration · Tool use · LLM observability · Model serving
Snorkel AI	Research Scientist - Frontier Benchmarks	Data AI	9	Frontier research · Synthetic data · LLM observability
Mixpanel	Senior Software Engineer, AI Platform	Data AI	9	Agent orchestration · Model serving · Inference infra · LLM observability · RAG
Amplitude	Staff AI Engineer	Data AI	9	Agent orchestration · Tool use · RAG · LLM observability · Model serving
Snowflake	Senior Engineering Manager - AI Forward Deployed Engineering for EMEA	Data AI	9	Agent orchestration · RAG · Fine-tuning · Guardrails · LLM observability · Model serving
Scale AI	Research Scientist, Safety Post Training	Data AI	9	RL post-training · Interpretability · Guardrails
Databricks	AI Engineer - FDE (Forward Deployed Engineer)	Data AI	9	Agent orchestration · RAG · Fine-tuning · Model serving
Databricks	Senior Staff Applied AI Engineer - Context Retrieval	Data AI	9	RAG · Agent orchestration · Search & ranking · Model serving
Snorkel AI	AI Advocate, Open-Source & Research	Data AI	9	Fine-tuning · RL post-training · Agent research · Agent orchestration · Tool use · Frontier research · LLM observability
Scale AI	Director, Forward Deployed Engineering	Data AI	9	Agent orchestration · Model serving · Inference infra
Databricks	AI Engineer - FDE (Forward Deployed Engineer) - Public Sector	Data AI	9	Agent orchestration · RAG · Fine-tuning · Model serving
LangChain	Deployed Engineer (Phoenix)	Data AI	9	Agent orchestration · Tool use · Guardrails · LLM observability · Agent research
Databricks	AI Engineer - FDE (Forward Deployed Engineer)	Data AI	9	RAG · Agent orchestration · Fine-tuning · Model serving
Snorkel AI	Senior/Staff Research Scientist - Frontier Benchmarks	Data AI	9	Frontier research · Synthetic data
LangChain	Solutions Architect (London)	Data AI	9	Agent orchestration · Inference infra · Model serving · RAG · Vector DB
Snowflake	Staff AI Engineer - Cortex Code Agentic System	Data AI	9	Agent orchestration · LLM observability · Guardrails · Model serving
Scale AI	Director, Enterprise Machine Learning & Research	Data AI	9	RL post-training · Agent research · Frontier research · Model serving
Grafana Labs	Staff AI Engineer \| US \| Remote	Data AI	9	Agent orchestration · Tool use · RAG · LLM observability · Guardrails
Snowflake	Staff Research Scientist, AI Agents & LLMs	Data AI	9	Agent orchestration · Agent research · Fine-tuning · Model serving · Inference infra
Scale AI	Research Scientist, Frontier Risk Evaluations	Data AI	9	Agent orchestration · Guardrails · Frontier research · LLM observability

Frequently asked questions

What is Evals in AI?
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
How many AI roles reference Evals right now?
2,040 active AI roles across 208 companies in our index reference Evals as of today.
Which companies are hiring for Evals roles?
The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).
What AI lifecycle stage does Evals belong to?
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).
What sectors invest most in Evals?
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.