Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.
Primary AI lifecycle stage: evaluation.
As of today, 1,975 active AI roles across 203 companies in our index reference Evals. Hiring concentrates at the agents (55%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.
2089 AI roles tagged evals.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Senior Staff Software Engineer, Cognitive Architecture, Special Projects | Big Tech | 10 | Interpretability · Agent orchestration · Agent research · Audio & speech · RL robotics · Model serving | |
| Wayve | Engineering Internship, Enrichment and Curation | Robotics | 9 | Embodied AI · Multimodal · Vision · Pretraining · Fine-tuning |
| Forward Deployed Engineer III, GenAI | Big Tech | 9 | Agent orchestration · RAG · Vector DB · LLM observability · Model serving | |
| Partner Forward Deployed Engineer IV, Generative AI, Google Cloud | Big Tech | 9 | Agent orchestration · RAG · Vector DB · LLM observability | |
| ClickHouse | AI Product Engineer - ClickStack | Data AI | 9 | Agent orchestration · Tool use · LLM observability · RAG |
| ClickHouse | AI Product Engineer - ClickStack | Data AI | 9 | Agent orchestration · Tool use · LLM observability · RAG |
| ClickHouse | AI Product Engineer - ClickStack | Data AI | 9 | Agent orchestration · Tool use · LLM observability · RAG |
| ClickHouse | AI Product Engineer - ClickStack | Data AI | 9 | Agent orchestration · Tool use · LLM observability · RAG · Model serving |
| ClickHouse | AI Product Engineer - ClickStack | Data AI | 9 | Agent orchestration · Tool use · LLM observability · Model serving |
| Software Engineer III, Multimodal Agentic AI, XR | Big Tech | 9 | Agent orchestration · Multimodal · Vision · Inference infra · Model serving · Fine-tuning · Agent research | |
| Partner Forward Deployed Engineer, Generative AI, Google Cloud (Japanese, English) | Big Tech | 9 | Agent orchestration · RAG · Vector DB · LLM observability · Model serving | |
| Okta | Staff Product Security Engineer | Enterprise | 9 | Agent orchestration · Agent research · Guardrails · LLM observability · Tool use |
| Forward Deployed Engineer IV, Generative AI, Google Cloud | Big Tech | 9 | Agent orchestration · RAG · Vector DB · LLM observability · Model serving | |
| Adobe | Applied Scientist 5.5 | Enterprise | 9 | Fine-tuning · Inference infra · Model serving · Multimodal |
| Staff Software Engineer, Generative AI | Big Tech | 9 | Agent orchestration · Inference infra · Model serving · Fine-tuning · Multimodal · Vision | |
| Senior Staff Software Engineer, Agentic Data Tooling, DeepMind | Big Tech | 9 | Agent orchestration · Tool use · LLM observability · RAG · Fine-tuning · RL post-training · Embodied AI | |
| Partner Forward Deployed Engineer, Generative AI, Google Cloud | Big Tech | 9 | Agent orchestration · RAG · Vector DB · LLM observability · Model serving | |
| JPMorgan Chase | Applied Machine Learning Scientist - Vice President | Banking | 9 | Agent orchestration · Tool use · Guardrails · LLM observability · RAG · Fine-tuning · Model serving · Recommender systems · Multimodal · Agent research · RL post-training |
| Ramp | Senior Growth Operator, Partner | Fintech | 9 | Agent orchestration · Tool use · Guardrails |
| Augury | Senior GenAI Engineer | Vertical AI | 9 | Agent orchestration · Tool use · RAG · Vector DB · Fine-tuning · Model serving · LLM observability · Multimodal |
| Wayve | Tech Lead, Autonomy Performance - Robotaxi | Robotics | 9 | Embodied AI · Model serving · Inference infra |
| Amazon | Applied Sciences Manager , Ads Brand Safety and Suitability | Big Tech | 9 | LLM observability · Model serving · Inference infra · Multimodal · Guardrails · Fine-tuning |
| Netflix | Data Scientist 5 - AI Evals | Big Tech | 9 | LLM observability · Agent orchestration · RAG · Agent research · Guardrails |
| Oracle | Snr Director, Applied Science | Enterprise | 9 | Multimodal · Agent orchestration · Model serving · Inference infra · RAG · Guardrails · LLM observability · Vision · Audio & speech |
| Senior Staff Software Engineer, AI/ML, Applied AI | Big Tech | 9 | Agent orchestration · Multimodal · Audio & speech · Model serving | |
| Microsoft | Member of Technical Staff, Microsoft Robotics (Robot Learning) | Big Tech | 9 | Embodied AI · RL robotics · Vision · Multimodal · Fine-tuning · Model serving |
| Forward Deployed Developer III, Generative AI, Google Cloud | Big Tech | 9 | Agent orchestration · RAG · Vector DB · LLM observability | |
| Apptronik | Staff MLOps Engineer | Robotics | 9 | Model serving · Inference infra |
| Forward Deployed Engineer, Generative AI, Google Cloud (Korean, English) | Big Tech | 9 | Agent orchestration · RAG · Vector DB · Model serving · LLM observability | |
| Wayve | Machine Learning Engineer, App SW | Robotics | 9 | Embodied AI · Model serving · Synthetic data |
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
1,975 active AI roles across 203 companies in our index reference Evals as of today.
The companies with the most active Evals listings are: Amazon (177 roles), Google (125 roles), OpenAI (90 roles), Microsoft (75 roles), Anthropic (67 roles).
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (55%), evaluation (12%).
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.