Which companies are hiring for Evals roles?

The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).

What AI lifecycle stage does Evals belong to?

Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).

What sectors invest most in Evals?

The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.

← Tag co-occurrence network

Evals

Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.

Primary AI lifecycle stage: evaluation.

As of today, 2,040 active AI roles across 208 companies in our index reference Evals. Hiring concentrates at the agents (57%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.

Top hiring:

Function

All Engineering · 2406 Research · 518 Product · 411

Status

All Active only

Sort

AI score Recently posted Company A–Z

FilteredsectorConsumer×

198 AI roles tagged evals.

Company	Title	Sector	AI score	Other tags
Airbnb	Senior Machine Learning Engineer, Customer Support Engineering	Consumer	9	Agent orchestration · Tool use · Guardrails · RAG · Fine-tuning · Model serving · RLHF · Agent research
Reddit	Senior Machine Learning Engineer, GenAI Security	Consumer	9	Agent orchestration · Tool use · Guardrails · Fine-tuning · Model serving
Zillow	Senior Machine Learning Engineer	Consumer	9	Agent orchestration · Multimodal · Guardrails · LLM observability · Model serving
DoorDash	AI Research Fellowship, (Summer and Fall 2026)	Consumer	9	Agent orchestration · Tool use · Forecasting · Multimodal · Vision · Audio & speech · Frontier research · Synthetic data
Airbnb	Machine Learning Engineer, Customer Support Engineering	Consumer	9	Agent orchestration · Tool use · Guardrails · RAG · Fine-tuning · Model serving · RL post-training · Agent research
Pinterest	Principal Engineer, Agentic Engineering	Consumer	9	Agent orchestration · Agent research · Guardrails · LLM observability · Tool use
Pinterest	Sr. Data Scientist, Responsible AI	Consumer	9	Guardrails · LLM observability · Agent research · Multimodal
Zillow	Principal Machine Learning Engineer, Agentic AI	Consumer	9	Agent orchestration · Multimodal · Guardrails · LLM observability · Model serving · Agent research
Uber	2026 PhD Research Intern, India	Consumer	9	Fine-tuning · RLHF · Agent research · Frontier research
Zillow	Principal Applied Scientist, Agentic AI	Consumer	9	RL post-training · RLHF · Reward modeling · Fine-tuning · Guardrails · Agent orchestration · Multimodal · Vector DB
Uber	Senior Research Scientist, Generative AI	Consumer	9	RL post-training · Fine-tuning · Frontier research · Vision
Zillow	Senior Applied Scientist, Agentic AI	Consumer	9	Agent orchestration · Tool use · Fine-tuning · LLM observability · Agent research
Pinterest	Machine Learning Engineer II, Computer Vision Applied Science	Consumer	9	Vision · Multimodal · Fine-tuning · RLHF · Model serving
Roblox	Principal Machine Learning Engineer, Engineering Acceleration	Consumer	9	Agent orchestration · Agent research · Synthetic data · Fine-tuning · Model serving · Code gen
Uber	Senior Staff Machine Learning Engineer – Moonshot AI	Consumer	9	Multimodal · Vision · Audio & speech · LLM observability · Fine-tuning · RAG · Model serving · Recommender systems
Reddit	Staff Research Engineer, Post-training & Evaluation	Consumer	9	Fine-tuning · LLM observability · Frontier research · RL post-training
Uber	Principal Machine Learning Engineer - AV Labs	Consumer	9	Multimodal · Model serving
Uber	Senior Applied Scientist – AI Red Teaming & Model Risk	Consumer	9	Guardrails · Agent orchestration · Tool use · LLM observability · Agent research
Zillow	Distinguished Scientist	Consumer	9	Agent orchestration · Agent research · Multi-agent · Fine-tuning · RL post-training · LLM observability · Multimodal
Uber	Staff ML Engineer, Generative AI	Consumer	9	Agent orchestration · Tool use · Guardrails · LLM observability · RAG · Fine-tuning · Model serving · Multimodal · Audio & speech
Zillow	AI Applied Scientist - PhD Intern, Generative Computer Vision	Consumer	9	Vision · Multimodal · Fine-tuning
Zillow	AI Applied Scientist - PhD Intern, Foundational IQ	Consumer	9	Fine-tuning · Multimodal · Agent orchestration
Zillow	AI Applied Scientist - PhD Intern, 3D Computer Vision	Consumer	9	Vision · Multimodal · Fine-tuning
Airbnb	Senior Staff Machine Learning Engineer, Data & Eval	Consumer	9	LLM observability · Guardrails · RAG · Agent orchestration · Tool use · Fine-tuning · Synthetic data
Instacart	Machine Learning Engineer, PhD Intern	Consumer	9	LLM observability · RAG · Fine-tuning · Inference infra · Model serving · Recommender systems · Search & ranking · Agent research
DoorDash	Software Engineer, Machine Learning Infrastructure - Gen AI	Consumer	8	Agent orchestration · Tool use · Guardrails · LLM observability · RAG · Vector DB · Fine-tuning · Inference infra · Model serving
Chegg	Senior Software Engineer - Agentic AI Applications	Consumer	8	Agent orchestration · Tool use · Guardrails · LLM observability · RAG · Vector DB · Fine-tuning · Model serving · Multimodal
Roblox	Senior Data Scientist - Generative AI	Consumer	8	LLM observability · Fine-tuning · Agent orchestration · Multimodal · Code gen
Duolingo	Senior AI Engineering Manager	Consumer	8	Recommender systems · Fine-tuning · Model serving · LLM observability
Duolingo	Senior AI Engineering Manager	Consumer	8	Recommender systems · Fine-tuning · Model serving · LLM observability

Frequently asked questions

What is Evals in AI?
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
How many AI roles reference Evals right now?
2,040 active AI roles across 208 companies in our index reference Evals as of today.
Which companies are hiring for Evals roles?
The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).
What AI lifecycle stage does Evals belong to?
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).
What sectors invest most in Evals?
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.