Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.
Primary AI lifecycle stage: evaluation.
As of today, 2,040 active AI roles across 208 companies in our index reference Evals. Hiring concentrates at the agents (57%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
2,040 active AI roles across 208 companies in our index reference Evals as of today.
The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.
42 AI roles tagged evals.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Disney | Lead Machine Learning Engineer | Media | 9 | Agent orchestration · Agent research · Multimodal · RAG · LLM observability · Guardrails · Model serving · Inference infra |
| Warner Bros Discovery | Manager, Machine Learning Engineering | Media | 8 | Model serving |
| Disney | Lead Product Manager, AI Platform | Media | 8 | Agent orchestration · RAG · LLM observability · Model serving |
| Disney | Director, Decision Science AI/ML Engineering & Ops | Media | 8 | Model serving · Inference infra · LLM observability · Guardrails |
| Comcast | Engineer 4 - Machine Learning | Media | 8 | Agent orchestration · LLM observability · Model serving · Fine-tuning · Guardrails |
| Disney | Omni-Channel Analytics Mgr | Media | 8 | Agent orchestration · Agent research · Fine-tuning · Guardrails · LLM observability |
| Disney | Manager - Applied AI | Media | 8 | Agent orchestration · Tool use · RAG · LLM observability · Model serving |
| Disney | Staff GenAI/ML Engineer (Emerging Tech & AI Automation) Project Hire | Media | 8 | Agent orchestration · RAG · Fine-tuning · Model serving · Vector DB · LLM observability |
| The Trade Desk | Staff Product Manager, Agentic AI | Media | 8 | Agent orchestration · LLM observability · Guardrails |
| Comcast | Software Engineer - Agentic AI | Media | 8 | Agent orchestration · Tool use · LLM observability · RAG · Agent research |
| Comcast | Comcast AI Research Intern | Media | 8 | Fine-tuning · RL post-training · Synthetic data · Agent research |
| Comcast | Principal Engineer - Agentic AI | Media | 8 | Agent orchestration · Agent research · LLM observability · Tool use |
| Disney | Sr Data Scientist | Media | 8 | Multimodal · Fine-tuning · RAG · Inference infra · Model serving · Vector DB |
| Comcast | Software Engineering Manager, AI Agents | Media | 8 | Agent orchestration · Tool use · Guardrails · LLM observability · Model serving |
| Comcast | Engineer 3, Agentic AI | Media | 8 | Agent orchestration · Tool use · LLM observability · Agent research · Code gen |
| Disney | Staff GenAI/ML Engineer (Emerging Tech & AI Automation) Project Hire | Media | 8 | Agent orchestration · RAG · Vector DB · Fine-tuning · Model serving · LLM observability |
| Comcast | Machine Learning Engineer 4 | Media | 8 | Agent orchestration · LLM observability · Guardrails · Model serving · Inference infra |
| Comcast | Sr. Software Engineer - Agentic AI | Media | 8 | Agent orchestration · Agent research · LLM observability · RAG |
| Comcast | Agent Evaluation Engineer | Media | 8 | Agent orchestration · LLM observability · Guardrails |
| Warner Bros Discovery | Sr. Staff, Data Science & Applied AI | Media | 8 | Agent orchestration · RAG · Guardrails · LLM observability · Model serving |
| Disney | Lead Data Scientist, Ad Research | Media | 8 | Agent orchestration · Multimodal · Vision |
| Disney | Senior Machine Learning Engineer, Ad Platforms | Media | 8 | Agent orchestration · Multimodal · Fine-tuning · Model serving · Audio & speech |
| Disney | Lead Machine Learning Engineer, Ad Platforms | Media | 8 | Recommender systems · Search & ranking · Fine-tuning · RAG · LLM observability · Multimodal · Vision |
| Disney | VP, Analytics Engineering & DnA Operations | Media | 7 | Agent orchestration · Guardrails · LLM observability |
| Comcast | Development Engineer in Test (SDET) – ML & LLM Systems | Media | 7 | LLM observability · Fine-tuning · Model serving |
| Comcast | Software Development Engineer in Test (SDET) – ML & LLM Systems | Media | 7 | LLM observability · Fine-tuning · Model serving |
| Comcast | Quality Engineering Lead – Agent Evaluation & AI Platforms | Media | 7 | Agent orchestration · Guardrails · LLM observability |
| Comcast | Agentic AI Test Engineer | Media | 7 | Agent orchestration · LLM observability |
| Comcast | Sr. Python Engineer, Agentic AI | Media | 7 | Agent orchestration · Tool use · Guardrails · LLM observability |
| Warner Bros Discovery | Manager, Machine Learning Engineer | Media | 7 | Model serving |