24 AI roles tagged evals.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Skydio | Staff Product Manager, Vehicle AI | Defense | 9 | Agent orchestration · Multimodal |
| Shield AI | Principal Engineer, AI Infrastructure (R4941) | Defense | 9 | Model serving · Inference infra · Fine-tuning · Multimodal · Embodied AI |
| Anduril | AI Sorcerer | Defense | 9 | Agent orchestration · Tool use · Multimodal |
| Skydio | Senior Autonomy Engineer - Deep Learning | Defense | 9 | Vision · Fine-tuning · Inference infra · Model serving · Synthetic data |
| Shield AI | Product Manager, AI Platforms (R4991) | Defense | 9 | Multimodal · Training infra · Synthetic data · Inference infra · Model serving |
| Anduril | Senior Machine Learning Engineer | Defense | 8 | Agent orchestration · Fine-tuning · Inference infra · Model serving |
| Anduril | Senior Machine Learning Engineer, Perception | Defense | 8 | Model serving · Inference infra |
| Anduril | Generative AI Integration Engineer | Defense | 8 | Agent orchestration · Tool use · LLM observability · Multimodal |
| Anduril | Lead Systems Engineer, Battlespace | Defense | 7 | Model serving |
| Anduril | Flight Test Engineer - Mission Autonomy | Defense | 7 | Agent orchestration |
| Anduril | Senior Test and Evaluation Engineer - Mission Autonomy | Defense | 7 | Agent orchestration |
| Anduril | Test and Evaluation Engineer - Mission Autonomy | Defense | 7 |
| Shield AI | Director, Test Engineering - ACP Programs (R5044) | Defense | 7 | Agent orchestration · Embodied AI |
| Shield AI | Engineer II, Systems Test (R5038) | Defense | 7 |
| Anduril | Systems Engineer, Battlespace | Defense | 7 | Agent orchestration · Model serving |
| Anduril | Senior Manager, Software Engineering | Defense | 7 | Model serving · Inference infra · LLM observability · Guardrails |
| Anduril | Staff Software Engineer | Defense | 7 | Agent orchestration · Multimodal |
| Shield AI | Staff Engineer, Operations Analysis (R4757) | Defense | 7 | Agent research · Embodied AI |
| Anduril | Staff Flight Test Engineer - Mission Autonomy (UAS) | Defense | 7 | Agent orchestration |
| Anduril | Senior Flight Test Engineer - Mission Autonomy (UAS) | Defense | 7 | Agent orchestration |
| Anduril | Senior Flight Test Engineer - Mission Autonomy | Defense | 7 | Agent orchestration |
| Anduril | Robotics Software Engineer, Verification & Validation | Defense | 5 | Agent orchestration |
| Saronic | Mission Operations Engineer | Defense | 5 | Agent orchestration |
| Saronic | Mission Operations Engineer (Product) | Defense | 5 | Agent orchestration · Model serving |
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.
Primary AI lifecycle stage: evaluation.
As of today, 2,040 active AI roles across 208 companies in our index reference Evals. Hiring concentrates at the agents (57%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
2,040 active AI roles across 208 companies in our index reference Evals as of today.
The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.