Which companies are hiring for Evals roles?

The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).

What AI lifecycle stage does Evals belong to?

Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).

What sectors invest most in Evals?

The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.

← Tag co-occurrence network

Evals

Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.

Primary AI lifecycle stage: evaluation.

As of today, 2,040 active AI roles across 208 companies in our index reference Evals. Hiring concentrates at the agents (57%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.

Top hiring:

Function

All Engineering · 2406 Research · 518 Product · 411

Status

All Active only

Sort

AI score Recently posted Company A–Z

FilteredsectorAI Frontier×

430 AI roles tagged evals.

Company	Title	Sector	AI score	Other tags
OpenAI	Researcher, Context - Agent Post-Training	AI Frontier	10	RL post-training · Agent research · Synthetic data · Agent orchestration · Tool use · LLM observability
OpenAI	Researcher, Connectors - Agent Post-Training	AI Frontier	10	RL post-training · Agent orchestration · Tool use · Fine-tuning · Model serving · Agent research
OpenAI	Researcher, Computer Use - Agent Post-Training	AI Frontier	10	RL post-training · Agent orchestration · Synthetic data · Fine-tuning · Agent research
OpenAI	Researcher, Misalignment Research	AI Frontier	10	Guardrails · Agent research · Frontier research
Mistral AI	AI Scientist - Zurich	AI Frontier	10	Frontier research · Pretraining · Agent research · Multimodal · Audio & speech · Code gen · Model serving · Fine-tuning
Mistral AI	AI Scientist - Paris/London - Onsite or Hybrid or Remote	AI Frontier	10	Frontier research · Pretraining · Fine-tuning · Model serving · Multimodal · Audio & speech · Agent research
Mistral AI	AI Scientist - Palo Alto	AI Frontier	10	Frontier research · Pretraining · Agent research · Multimodal · Audio & speech · Code gen · Model serving
OpenAI	Researcher, Loss of Control	AI Frontier	10	Agent orchestration · Tool use · Guardrails · LLM observability · Agent research
Anthropic	Research Engineer, Machine Learning (Reinforcement Learning)	AI Frontier	10	Agent orchestration · Tool use · RL post-training · Frontier research · Code gen
Anthropic	Research Engineer, Frontier Red Team (Autonomy)	AI Frontier	10	Agent orchestration · Tool use · Guardrails · Embodied AI · RL robotics · Agent research
OpenAI	Research Engineer, Frontier Evals & Environments - Finance	AI Frontier	10	Frontier research · Agent research
Anthropic	Anthropic AI Safety Fellow, UK	AI Frontier	10	Frontier research · Interpretability · Guardrails · RLHF
Anthropic	Anthropic AI Safety Fellow, US	AI Frontier	10	Frontier research · Interpretability · Guardrails · RL post-training
Anthropic	Staff Research Engineer, Discovery Team	AI Frontier	10	Frontier research · Pretraining · Fine-tuning · Inference infra · Model serving · Agent orchestration
OpenAI	Research Engineer, Frontier Evals & Environments	AI Frontier	10	RL robotics · Agent research · Frontier research · LLM observability · RL post-training
OpenAI	Research Engineer / Research Scientist -Personal AGI, Proactivity	AI Frontier	9	RL post-training · Agent research · Agent orchestration
Anthropic	Research Engineer, Domain Scaling	AI Frontier	9	RL post-training · Synthetic data · Reward modeling · Fine-tuning
OpenAI	Forward Deployed Engineer - Stockholm	AI Frontier	9	Model serving · Inference infra · LLM observability · Agent orchestration · RAG · Vector DB
Writer	Staff AI research scientist	AI Frontier	9	Frontier research · RL post-training · Agent research · Agent orchestration · Tool use · Fine-tuning · Pretraining · LLM observability
Perplexity	Member of Technical Staff (Software Engineer, Agent Capabilities)	AI Frontier	9	Agent orchestration · Agent research · Model serving
Anthropic	Research Engineer, Code RL (Reinforcement Learning)	AI Frontier	9	RL post-training · Fine-tuning · Agent orchestration · Tool use · Code gen
Sierra	Software Engineer, Agent (Dutch speaking)	AI Frontier	9	Agent orchestration · Model serving · RAG · Agent research
OpenAI	Researcher, Agent Post-Training, Personality	AI Frontier	9	RL post-training · Reward modeling · Agent research · Fine-tuning · LLM observability
Mistral AI	Applied AI Engineer, CyberSecurity	AI Frontier	9	Agent orchestration · Tool use · RAG
Anthropic	Software Engineer, Safeguards Evals	AI Frontier	9	Agent orchestration · Guardrails · LLM observability · Synthetic data · Agent research · RL post-training
OpenAI	Researcher: Agent Post-Training, API & Power-Users	AI Frontier	9	RL post-training · Agent orchestration · Tool use · Fine-tuning · Model serving
Anthropic	Product Manager, Claude Code Model Performance	AI Frontier	9	Agent orchestration · Code gen · LLM observability
Anthropic	Research Scientist, Life Sciences	AI Frontier	9	Agent orchestration · Tool use · Fine-tuning · RL post-training
OpenAI	Software Engineer, Cyber Frontier	AI Frontier	9	Guardrails · Model serving · Frontier research
OpenAI	Researcher, Artifacts - Agent Post-Training	AI Frontier	9	RL post-training · Agent orchestration · Fine-tuning · Model serving · Synthetic data · Agent research

Frequently asked questions

What is Evals in AI?
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
How many AI roles reference Evals right now?
2,040 active AI roles across 208 companies in our index reference Evals as of today.
Which companies are hiring for Evals roles?
The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).
What AI lifecycle stage does Evals belong to?
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).
What sectors invest most in Evals?
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.