Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.
Primary AI lifecycle stage: evaluation.
As of today, 2,040 active AI roles across 208 companies in our index reference Evals. Hiring concentrates at the agents (57%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
2,040 active AI roles across 208 companies in our index reference Evals as of today.
The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.
17 AI roles tagged evals.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Expedia | Senior Machine Learning Scientist - Agentic Experience | Hospitality | 9 | Agent orchestration · Tool use · RAG · LLM observability |
| Expedia | Machine Learning Scientist III - Agentic Experience | Hospitality | 9 | Agent orchestration · Tool use · RAG · LLM observability |
| Booking | Senior Machine Learning Scientist | Hospitality | 9 | Agent orchestration · Agent research · RAG · LLM observability · Tool use |
| Expedia | Senior Machine Learning Engineer (Gen AI & Multi-Agentic Systems) | Hospitality | 9 | Agent orchestration · RAG · Vector DB · Fine-tuning · RL post-training · Inference infra · Model serving · Multimodal · Vision · Audio & speech · Code gen · Guardrails · LLM observability |
| Expedia | Machine Learning Engineer III (Gen AI & Multi-Agentic Systems) | Hospitality | 9 | Agent orchestration · Fine-tuning · RAG · Vector DB · Multimodal · Inference infra · Model serving · LLM observability · Guardrails · RL post-training · Code gen |
| Expedia | Principal Software Development Engineer - Gen AI | Hospitality | 8 | Agent orchestration · Tool use · Guardrails · LLM observability · RAG · Vector DB · Fine-tuning · Model serving |
| Expedia | Senior Machine Learning Scientist | Hospitality | 8 | Agent orchestration · Tool use · RAG · Fine-tuning · Model serving · Recommender systems |
| Booking | Senior Machine Learning Scientist I | Hospitality | 8 | Agent orchestration · RAG · LLM observability · Fine-tuning · Model serving |
| Expedia | Senior Data Scientist, Analytics | Hospitality | 7 | RAG · Model serving |
| Expedia | Principal Quantitative User Experience Researcher, AI | Hospitality | 7 | LLM observability · Agent research · Recommender systems |
| Booking | Senior Solutions Architect - Data & AI | Hospitality | 7 | Agent orchestration · Vector DB · LLM observability · Model serving |
| Expedia | Senior Software Development Engineer - Agentic AI Experience | Hospitality | 7 | Agent orchestration · Multimodal · Guardrails · LLM observability · RAG · Model serving |
| Expedia | Director of Product, AI Builder Experiences | Hospitality | 7 | Agent orchestration · Tool use · RAG · LLM observability |
| Expedia | Senior Product Manager, Lodging Connectivity | Hospitality | 7 | Agent orchestration · Tool use · LLM observability · Fine-tuning · Model serving |
| Expedia | Lead Data Scientist (Resiliency Engineering) | Hospitality | 5 | Model serving · Inference infra · LLM observability |
| Booking | Enterprise Applications Engineering - IC - F | Hospitality | 5 | Agent orchestration · Guardrails · Fine-tuning · Model serving |
| Expedia | Senior Economist - Advertising Technology | Hospitality | 5 | Recommender systems · Search & ranking |