Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
2,040 active AI roles across 208 companies in our index reference Evals as of today.
The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.
Primary AI lifecycle stage: evaluation.
As of today, 2,040 active AI roles across 208 companies in our index reference Evals. Hiring concentrates at the agents (57%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.
206 AI roles tagged evals.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Together AI | Research Intern, Model Shaping (Fall 2026) | Data AI | 9 | Fine-tuning · RL post-training · Frontier research · Model serving |
| Together AI | Frontier Agents Intern (Fall 2026) | Data AI | 9 | Agent orchestration · Agent research · Frontier research · RL post-training · Multimodal · Audio & speech · LLM observability |
| Snowflake | Post-Doctoral Researcher (Fixed-Term) | Data AI | 9 | Frontier research · Fine-tuning · RAG · Interpretability · Multimodal |
| Fireworks AI | AI Field Engineer - Enterprise | Data AI | 9 | Model serving · Inference infra · Fine-tuning |
| Fireworks AI | AI Field Engineer - Microsoft Foundry | Data AI | 9 | Model serving · Inference infra · Fine-tuning · Tool use · Agent orchestration |
| Fireworks AI | AI Field Engineer - AI Natives | Data AI | 9 | Inference infra · Model serving · Fine-tuning |
| ClickHouse | AI Product Engineer - ClickStack | Data AI | 9 | Agent orchestration · Tool use · LLM observability · RAG |
| ClickHouse | AI Product Engineer - ClickStack | Data AI | 9 | Agent orchestration · Tool use · LLM observability · RAG |
| ClickHouse | AI Product Engineer - ClickStack | Data AI | 9 | Agent orchestration · Tool use · LLM observability · RAG |
| ClickHouse | AI Product Engineer - ClickStack | Data AI | 9 | Agent orchestration · Tool use · LLM observability · RAG · Model serving |
| ClickHouse | AI Product Engineer - ClickStack | Data AI | 9 | Agent orchestration · Tool use · LLM observability · Model serving |
| Snorkel AI | Research Scientist - Frontier Benchmarks | Data AI | 9 | Frontier research · Synthetic data · LLM observability |
| Mixpanel | Senior Software Engineer, AI Platform | Data AI | 9 | Agent orchestration · Model serving · Inference infra · LLM observability · RAG |
| Amplitude | Staff AI Engineer | Data AI | 9 | Agent orchestration · Tool use · RAG · LLM observability · Model serving |
| Snowflake | Senior Engineering Manager - AI Forward Deployed Engineering for EMEA | Data AI | 9 | Agent orchestration · RAG · Fine-tuning · Guardrails · LLM observability · Model serving |
| Scale AI | Research Scientist, Safety Post Training | Data AI | 9 | RL post-training · Interpretability · Guardrails |
| Databricks | AI Engineer - FDE (Forward Deployed Engineer) | Data AI | 9 | Agent orchestration · RAG · Fine-tuning · Model serving |
| Databricks | Senior Staff Applied AI Engineer - Context Retrieval | Data AI | 9 | RAG · Agent orchestration · Search & ranking · Model serving |
| Snorkel AI | AI Advocate, Open-Source & Research | Data AI | 9 | Fine-tuning · RL post-training · Agent research · Agent orchestration · Tool use · Frontier research · LLM observability |
| Scale AI | Director, Forward Deployed Engineering | Data AI | 9 | Agent orchestration · Model serving · Inference infra |
| Databricks | AI Engineer - FDE (Forward Deployed Engineer) - Public Sector | Data AI | 9 | Agent orchestration · RAG · Fine-tuning · Model serving |
| LangChain | Deployed Engineer (Phoenix) | Data AI | 9 | Agent orchestration · Tool use · Guardrails · LLM observability · Agent research |
| Databricks | AI Engineer - FDE (Forward Deployed Engineer) | Data AI | 9 | RAG · Agent orchestration · Fine-tuning · Model serving |
| Snorkel AI | Senior/Staff Research Scientist - Frontier Benchmarks | Data AI | 9 | Frontier research · Synthetic data |
| LangChain | Solutions Architect (London) | Data AI | 9 | Agent orchestration · Inference infra · Model serving · RAG · Vector DB |
| Snowflake | Staff AI Engineer - Cortex Code Agentic System | Data AI | 9 | Agent orchestration · LLM observability · Guardrails · Model serving |
| Scale AI | Director, Enterprise Machine Learning & Research | Data AI | 9 | RL post-training · Agent research · Frontier research · Model serving |
| Grafana Labs | Staff AI Engineer | US | Remote | Data AI | 9 | Agent orchestration · Tool use · RAG · LLM observability · Guardrails |
| Snowflake | Staff Research Scientist, AI Agents & LLMs | Data AI | 9 | Agent orchestration · Agent research · Fine-tuning · Model serving · Inference infra |
| Scale AI | Research Scientist, Frontier Risk Evaluations | Data AI | 9 | Agent orchestration · Guardrails · Frontier research · LLM observability |