125 AI roles tagged evals.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Block | Principal Engineer, AI Systems | Fintech | 9 | Agent orchestration · Tool use · RAG · Fine-tuning · LLM observability · Agent research |
| Ramp | Senior Growth Operator, Partner | Fintech | 9 | Agent orchestration · Tool use · Guardrails |
| SoFi | Staff AI Engineer | Fintech | 9 | Agent orchestration · Tool use · LLM observability · RAG |
| Upstart | Principal Engineer, LLM | Fintech | 9 | Inference infra · Model serving · RAG · Vector DB · LLM observability |
| Ramp | Agentic Operator, Growth Marketing | Fintech | 9 | Agent orchestration · Tool use · Guardrails · RAG · Fine-tuning · LLM observability |
| Gusto | Sr. Staff AI/ML Engineer | Fintech | 9 | Agent orchestration · RAG · Model serving · LLM observability · Guardrails |
| PayPal | Staff Machine Learning Engineer | Fintech | 8 | Agent orchestration · Tool use · Guardrails · Fine-tuning · Recommender systems · Agent research |
| Plaid | Engineering Manager, AI Applications | Fintech | 8 | Agent orchestration · Fine-tuning · RAG · Vector DB · LLM observability · RLHF · Agent research |
| Plaid | Staff Software Engineer - Instant Access | Fintech | 8 | Agent orchestration · LLM observability · Guardrails · Model serving · Code gen |
| Block | Staff Applied Machine Learning Engineer - Fraud & Abuse | Fintech | 8 | Model serving · Inference infra · Agent orchestration |
| Block | Senior Machine Learning Engineer, Model Risk Management | Fintech | 8 | Agent orchestration · LLM observability |
| Robinhood | Senior Software Engineer | Fintech | 8 | Model serving · Inference infra · Fine-tuning |
| Ripple | Staff Software Engineer, GenAI Platform | Fintech | 8 | Agent orchestration · Tool use · Guardrails · LLM observability · RAG · Inference infra · Model serving |
| Robinhood | Senior Engineering Manager, Agentic AI | Fintech | 8 | Agent orchestration · Tool use · LLM observability · Model serving |
| Gusto | AI Solutions Architect | Fintech | 8 | Agent orchestration · Tool use · Guardrails · LLM observability |
| Visa | Senior Manager- (Agentic AI Development/Solution/Client facing) | Fintech | 8 | Agent orchestration · Tool use · RAG · Fine-tuning · LLM observability |
| Betterment | Sr. Staff Engineer | Fintech | 8 | Agent orchestration · LLM observability · Guardrails · Model serving |
| Visa | Staff ML Scientist | Fintech | 8 | Model serving · Fine-tuning |
| Visa | Senior Machine Learning Scientist | Fintech | 8 | Fine-tuning · Model serving |
| Visa | Client Consulting Analyst | Fintech | 8 | Agent orchestration · RAG · Guardrails · Fine-tuning |
| SoFi | Director, AI Platforms | Fintech | 8 | Model serving · Inference infra · Agent orchestration · RAG · LLM observability · Guardrails |
| Robinhood | Staff Product Manager, Cortex | Fintech | 8 | Agent orchestration · RAG · Guardrails · LLM observability |
| Stripe | Machine Learning Engineer, Support Experience | Fintech | 8 | Agent orchestration · Tool use · RAG · Fine-tuning · LLM observability |
| Polymarket | AI Ops Specialist | Fintech | 8 | Agent orchestration · Tool use · RAG · LLM observability |
| Mercury | Senior Software Engineer - AI Engineering | Fintech | 8 | Agent orchestration · RAG · Guardrails · LLM observability · Model serving · Inference infra |
| PayPal | Sr Machine Learning Engineer | Fintech | 8 | Agent orchestration · Tool use · Agent research · LLM observability · RAG · Fine-tuning · Model serving |
| Carta | Senior Machine Learning Engineer II | Fintech | 8 | Agent orchestration · LLM observability |
| Upstart | Principal Product Manager, Agentic Platform | Fintech | 8 | Agent orchestration · Tool use · RAG · Guardrails · LLM observability |
| PayPal | Sr Machine Learning Engineer — Agentic Systems | Fintech | 8 | Agent orchestration · Tool use · Guardrails · RAG · Fine-tuning · Model serving |
| SoFi | Principal Product Manager, AI SDLC | Fintech | 8 | Agent orchestration · Tool use · LLM observability · Fine-tuning · Model serving |
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review.
Primary AI lifecycle stage: evaluation.
As of today, 2,040 active AI roles across 208 companies in our index reference Evals. Hiring concentrates at the agents (57%) and evaluation (12%) stages. Most common sectors: Big Tech, Enterprise, AI Frontier.
Designing benchmarks and automated scoring systems to measure model quality, safety, or capability — typically blending classical metrics, LLM-as-judge, and human review. Primary AI lifecycle stage: evaluation.
2,040 active AI roles across 208 companies in our index reference Evals as of today.
The companies with the most active Evals listings are: Amazon (188 roles), Google (153 roles), OpenAI (95 roles), Microsoft (73 roles), JPMorgan Chase (70 roles).
Evals primarily belongs to the evaluation stage of the AI lifecycle. In current hiring, Evals roles concentrate at: agents (57%), evaluation (12%).
The sectors with the most active Evals hiring are: Big Tech, Enterprise, AI Frontier.