Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
2,040 active AI roles across 208 companies in our index reference Evals as of today.
The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.
Primary AI lifecycle stage: evaluation.
As of today, 2,040 active AI roles across 208 companies in our index reference Evals. Hiring concentrates at the agents (57%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.
20 AI roles tagged evals.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Cognite | Senior AI Platform Engineer, Atlas AI | Industrial | 8 | Agent orchestration · Tool use · LLM observability · Model serving · Inference infra |
| Caterpillar | Analyst Applications – ServiceNow Conversational & GenAI | Industrial | 7 | Agent orchestration · LLM observability · Guardrails · RAG · Fine-tuning · Model serving |
| Caterpillar | Senior Manager - Connectivity Data Analytics | Industrial | 7 | Agent orchestration · Guardrails · LLM observability · Fine-tuning · Model serving · Recommender systems · Multimodal |
| Caterpillar | Senior Autonomy Development Engineer - SOTIF | Industrial | 7 | |
| Caterpillar | Senior Statistician – AI Product Assurance | Industrial | 7 | Agent research · Agent orchestration |
| Caterpillar | Senior Analytics Manager - AI Model & Prompt Engineering | Industrial | 7 | Agent orchestration · Tool use · Guardrails · RAG · Vector DB · Fine-tuning · Model serving · Multimodal · Agent research |
| Caterpillar | Senior Manager, Software Test Engineering | Industrial | 7 | Guardrails · LLM observability · Model serving |
| Caterpillar | Senior Autonomy Validation Engineer | Industrial | 7 | |
| Honeywell | AI Engr II | Industrial | 7 | Fine-tuning · Model serving |
| Caterpillar | Autonomy Application Architect | Industrial | 7 | Agent orchestration |
| Caterpillar | Principal GenAI Product Manager - Dealer and Customer Support | Industrial | 7 | Agent orchestration · LLM observability · Guardrails · Model serving |
| Caterpillar | Lead Data Scientist | Industrial | 7 | Agent orchestration · RAG · Multimodal |
| Honeywell | Sr AI Manager | Industrial | 7 | Agent orchestration · Tool use · LLM observability · RAG · Vector DB · Model serving |
| Honeywell | Advanced Cyber Sec Archt/Engr | Industrial | 7 | Agent orchestration · Tool use · Guardrails · RAG · LLM observability |
| Caterpillar | Autonomy Validation Engineer | Industrial | 5 | |
| Caterpillar | Senior Autonomy Project Team Lead | Industrial | 5 | |
| Caterpillar | Automation Engineer | Industrial | 5 | Agent orchestration · Tool use · LLM observability |
| Caterpillar | Autonomy Validation Engineer | Industrial | 5 | |
| Caterpillar | Autonomy Validation Engineer | Industrial | 5 | |
| Honeywell | Advanced Software Engr | Industrial | 5 | Agent orchestration |