Which companies are hiring for Evals roles?

The companies with the most active Evals listings are: Amazon (177 roles), Google (125 roles), OpenAI (90 roles), Microsoft (75 roles), Anthropic (67 roles).

What AI lifecycle stage does Evals belong to?

Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (55%), evaluation (12%).

What sectors invest most in Evals?

The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.

← Tag co-occurrence network

Evals

Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.

Primary AI lifecycle stage: evaluation.

As of today, 1,975 active AI roles across 203 companies in our index reference Evals. Hiring concentrates at the agents (55%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.

Top hiring:

Function

All Engineering · 2089 Research · 467 Product · 352

Status

All Active only

Sort

AI score Recently posted Company A–Z

FilteredfunctionEngineering×

2089 AI roles tagged evals.

Company	Title	Sector	AI score	Other tags
Google	Senior Staff Software Engineer, Cognitive Architecture, Special Projects	Big Tech	10	Interpretability · Agent orchestration · Agent research · Audio & speech · RL robotics · Model serving
Wayve	Engineering Internship, Enrichment and Curation	Robotics	9	Embodied AI · Multimodal · Vision · Pretraining · Fine-tuning
Google	Forward Deployed Engineer III, GenAI	Big Tech	9	Agent orchestration · RAG · Vector DB · LLM observability · Model serving
Google	Partner Forward Deployed Engineer IV, Generative AI, Google Cloud	Big Tech	9	Agent orchestration · RAG · Vector DB · LLM observability
ClickHouse	AI Product Engineer - ClickStack	Data AI	9	Agent orchestration · Tool use · LLM observability · RAG
ClickHouse	AI Product Engineer - ClickStack	Data AI	9	Agent orchestration · Tool use · LLM observability · RAG
ClickHouse	AI Product Engineer - ClickStack	Data AI	9	Agent orchestration · Tool use · LLM observability · RAG
ClickHouse	AI Product Engineer - ClickStack	Data AI	9	Agent orchestration · Tool use · LLM observability · RAG · Model serving
ClickHouse	AI Product Engineer - ClickStack	Data AI	9	Agent orchestration · Tool use · LLM observability · Model serving
Google	Software Engineer III, Multimodal Agentic AI, XR	Big Tech	9	Agent orchestration · Multimodal · Vision · Inference infra · Model serving · Fine-tuning · Agent research
Google	Partner Forward Deployed Engineer, Generative AI, Google Cloud (Japanese, English)	Big Tech	9	Agent orchestration · RAG · Vector DB · LLM observability · Model serving
Okta	Staff Product Security Engineer	Enterprise	9	Agent orchestration · Agent research · Guardrails · LLM observability · Tool use
Google	Forward Deployed Engineer IV, Generative AI, Google Cloud	Big Tech	9	Agent orchestration · RAG · Vector DB · LLM observability · Model serving
Adobe	Applied Scientist 5.5	Enterprise	9	Fine-tuning · Inference infra · Model serving · Multimodal
Google	Staff Software Engineer, Generative AI	Big Tech	9	Agent orchestration · Inference infra · Model serving · Fine-tuning · Multimodal · Vision
Google	Senior Staff Software Engineer, Agentic Data Tooling, DeepMind	Big Tech	9	Agent orchestration · Tool use · LLM observability · RAG · Fine-tuning · RL post-training · Embodied AI
Google	Partner Forward Deployed Engineer, Generative AI, Google Cloud	Big Tech	9	Agent orchestration · RAG · Vector DB · LLM observability · Model serving
JPMorgan Chase	Applied Machine Learning Scientist - Vice President	Banking	9	Agent orchestration · Tool use · Guardrails · LLM observability · RAG · Fine-tuning · Model serving · Recommender systems · Multimodal · Agent research · RL post-training
Ramp	Senior Growth Operator, Partner	Fintech	9	Agent orchestration · Tool use · Guardrails
Augury	Senior GenAI Engineer	Vertical AI	9	Agent orchestration · Tool use · RAG · Vector DB · Fine-tuning · Model serving · LLM observability · Multimodal
Wayve	Tech Lead, Autonomy Performance - Robotaxi	Robotics	9	Embodied AI · Model serving · Inference infra
Amazon	Applied Sciences Manager , Ads Brand Safety and Suitability	Big Tech	9	LLM observability · Model serving · Inference infra · Multimodal · Guardrails · Fine-tuning
Netflix	Data Scientist 5 - AI Evals	Big Tech	9	LLM observability · Agent orchestration · RAG · Agent research · Guardrails
Oracle	Snr Director, Applied Science	Enterprise	9	Multimodal · Agent orchestration · Model serving · Inference infra · RAG · Guardrails · LLM observability · Vision · Audio & speech
Google	Senior Staff Software Engineer, AI/ML, Applied AI	Big Tech	9	Agent orchestration · Multimodal · Audio & speech · Model serving
Microsoft	Member of Technical Staff, Microsoft Robotics (Robot Learning)	Big Tech	9	Embodied AI · RL robotics · Vision · Multimodal · Fine-tuning · Model serving
Google	Forward Deployed Developer III, Generative AI, Google Cloud	Big Tech	9	Agent orchestration · RAG · Vector DB · LLM observability
Apptronik	Staff MLOps Engineer	Robotics	9	Model serving · Inference infra
Google	Forward Deployed Engineer, Generative AI, Google Cloud (Korean, English)	Big Tech	9	Agent orchestration · RAG · Vector DB · Model serving · LLM observability
Wayve	Machine Learning Engineer, App SW	Robotics	9	Embodied AI · Model serving · Synthetic data

Frequently asked questions

What is Evals in AI?
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
How many AI roles reference Evals right now?
1,975 active AI roles across 203 companies in our index reference Evals as of today.
Which companies are hiring for Evals roles?
The companies with the most active Evals listings are: Amazon (177 roles), Google (125 roles), OpenAI (90 roles), Microsoft (75 roles), Anthropic (67 roles).
What AI lifecycle stage does Evals belong to?
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (55%), evaluation (12%).
What sectors invest most in Evals?
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.