Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
2,040 active AI roles across 208 companies in our index reference Evals as of today.
The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.
Primary AI lifecycle stage: evaluation.
As of today, 2,040 active AI roles across 208 companies in our index reference Evals. Hiring concentrates at the agents (57%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.
430 AI roles tagged evals.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| OpenAI | Researcher, Context - Agent Post-Training | AI Frontier | 10 | RL post-training · Agent research · Synthetic data · Agent orchestration · Tool use · LLM observability |
| OpenAI | Researcher, Connectors - Agent Post-Training | AI Frontier | 10 | RL post-training · Agent orchestration · Tool use · Fine-tuning · Model serving · Agent research |
| OpenAI | Researcher, Computer Use - Agent Post-Training | AI Frontier | 10 | RL post-training · Agent orchestration · Synthetic data · Fine-tuning · Agent research |
| OpenAI | Researcher, Misalignment Research | AI Frontier | 10 | Guardrails · Agent research · Frontier research |
| Mistral AI | AI Scientist - Zurich | AI Frontier | 10 | Frontier research · Pretraining · Agent research · Multimodal · Audio & speech · Code gen · Model serving · Fine-tuning |
| Mistral AI | AI Scientist - Paris/London - Onsite or Hybrid or Remote | AI Frontier | 10 | Frontier research · Pretraining · Fine-tuning · Model serving · Multimodal · Audio & speech · Agent research |
| Mistral AI | AI Scientist - Palo Alto | AI Frontier | 10 | Frontier research · Pretraining · Agent research · Multimodal · Audio & speech · Code gen · Model serving |
| OpenAI | Researcher, Loss of Control | AI Frontier | 10 | Agent orchestration · Tool use · Guardrails · LLM observability · Agent research |
| Anthropic | Research Engineer, Machine Learning (Reinforcement Learning) | AI Frontier | 10 | Agent orchestration · Tool use · RL post-training · Frontier research · Code gen |
| Anthropic | Research Engineer, Frontier Red Team (Autonomy) | AI Frontier | 10 | Agent orchestration · Tool use · Guardrails · Embodied AI · RL robotics · Agent research |
| OpenAI | Research Engineer, Frontier Evals & Environments - Finance | AI Frontier | 10 | Frontier research · Agent research |
| Anthropic | Anthropic AI Safety Fellow, UK | AI Frontier | 10 | Frontier research · Interpretability · Guardrails · RLHF |
| Anthropic | Anthropic AI Safety Fellow, US | AI Frontier | 10 | Frontier research · Interpretability · Guardrails · RL post-training |
| Anthropic | Staff Research Engineer, Discovery Team | AI Frontier | 10 | Frontier research · Pretraining · Fine-tuning · Inference infra · Model serving · Agent orchestration |
| OpenAI | Research Engineer, Frontier Evals & Environments | AI Frontier | 10 | RL robotics · Agent research · Frontier research · LLM observability · RL post-training |
| OpenAI | Research Engineer / Research Scientist -Personal AGI, Proactivity | AI Frontier | 9 | RL post-training · Agent research · Agent orchestration |
| Anthropic | Research Engineer, Domain Scaling | AI Frontier | 9 | RL post-training · Synthetic data · Reward modeling · Fine-tuning |
| OpenAI | Forward Deployed Engineer - Stockholm | AI Frontier | 9 | Model serving · Inference infra · LLM observability · Agent orchestration · RAG · Vector DB |
| Writer | Staff AI research scientist | AI Frontier | 9 | Frontier research · RL post-training · Agent research · Agent orchestration · Tool use · Fine-tuning · Pretraining · LLM observability |
| Perplexity | Member of Technical Staff (Software Engineer, Agent Capabilities) | AI Frontier | 9 | Agent orchestration · Agent research · Model serving |
| Anthropic | Research Engineer, Code RL (Reinforcement Learning) | AI Frontier | 9 | RL post-training · Fine-tuning · Agent orchestration · Tool use · Code gen |
| Sierra | Software Engineer, Agent (Dutch speaking) | AI Frontier | 9 | Agent orchestration · Model serving · RAG · Agent research |
| OpenAI | Researcher, Agent Post-Training, Personality | AI Frontier | 9 | RL post-training · Reward modeling · Agent research · Fine-tuning · LLM observability |
| Mistral AI | Applied AI Engineer, CyberSecurity | AI Frontier | 9 | Agent orchestration · Tool use · RAG |
| Anthropic | Software Engineer, Safeguards Evals | AI Frontier | 9 | Agent orchestration · Guardrails · LLM observability · Synthetic data · Agent research · RL post-training |
| OpenAI | Researcher: Agent Post-Training, API & Power-Users | AI Frontier | 9 | RL post-training · Agent orchestration · Tool use · Fine-tuning · Model serving |
| Anthropic | Product Manager, Claude Code Model Performance | AI Frontier | 9 | Agent orchestration · Code gen · LLM observability |
| Anthropic | Research Scientist, Life Sciences | AI Frontier | 9 | Agent orchestration · Tool use · Fine-tuning · RL post-training |
| OpenAI | Software Engineer, Cyber Frontier | AI Frontier | 9 | Guardrails · Model serving · Frontier research |
| OpenAI | Researcher, Artifacts - Agent Post-Training | AI Frontier | 9 | RL post-training · Agent orchestration · Fine-tuning · Model serving · Synthetic data · Agent research |