Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.
Primary AI lifecycle stage: evaluation.
As of today, 2,040 active AI roles across 208 companies in our index reference Evals. Hiring concentrates at the agents (57%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
2,040 active AI roles across 208 companies in our index reference Evals as of today.
The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.
13 AI roles tagged evals.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Northrop Grumman | Principal / Sr Principal AI Software Engineer | Aerospace | 8 | Model serving · Inference infra · Fine-tuning |
| Northrop Grumman | AI Systems Engineer (Principal or Sr. Principal Level) | Aerospace | 8 | Fine-tuning · Model serving · Inference infra |
| Northrop Grumman | AI Engineer Software (Level 2 or 3) | Aerospace | 8 | RAG · Agent orchestration · Tool use · Fine-tuning · Model serving · LLM observability |
| Northrop Grumman | Staff AI Software Engineer | Aerospace | 8 | Agent orchestration · Fine-tuning |
| Northrop Grumman | Software Engineer 2/3 | Aerospace | 7 | Agent orchestration · Tool use · RAG · LLM observability |
| Boeing | Data Scientist (Data Science) | Aerospace | 7 | Fine-tuning · RAG · Vector DB · Model serving |
| Boeing | Senior Data Scientist | Aerospace | 7 | Fine-tuning · Guardrails · Multimodal · LLM observability · RAG · Vector DB · Model serving |
| Boeing | Senior Human Resources Data Scientist | Aerospace | 7 | Fine-tuning · Model serving |
| Northrop Grumman | AI Systems Engineer (Principal or Sr. Principal Level) | Aerospace | 7 | Agent orchestration · Agent research |
| Northrop Grumman | AI Systems Engineer (Engineer or Principal Engineer Level) | Aerospace | 7 | Agent orchestration · Agent research |
| RTX | AI Solution Architect (Hybrid) | Aerospace | 7 | Guardrails · RAG · LLM observability |
| Boeing | Senior Business Intelligence and Governance Architect | Aerospace | 5 | Guardrails · Interpretability · Synthetic data |
| Boeing | Mid-Level Business Intelligence Analyst | Aerospace | 5 | Agent orchestration · Tool use · Guardrails · LLM observability · RAG · Vector DB · Fine-tuning · Inference infra · Model serving |