New AI postings mentioning Benchmarking per week — 166 total over 12 weeks.
293 active AI roles across 73 companies mention Benchmarking. Category: ML Ops & Evaluation.
Benchmarking is a skill in the "ML Ops & Evaluation" category. It currently appears in 293 active AI roles across 73 companies in our index.
The top employers with active AI roles mentioning Benchmarking are: NVIDIA (61), Amazon (38), Google (18), Microsoft (13), OpenAI (8).
Over the last 12 weeks, 166 new AI postings mentioned Benchmarking. Demand is rising — up 200% in the last four weeks compared to the earliest four weeks in the window.
Roles requiring Benchmarking are concentrated in: serving infrastructure (44%), agents (22%), post-training (11%). These stages follow a seven-stage AI lifecycle from data preparation through to shipped product.
Job postings that mention Benchmarking most often also require: Machine Learning, Computer Architecture, Python, Production ML Systems, GPU Computing.
12 AI roles requiring this skill.
| Company | Title | Sector | AI score | Stage |
|---|---|---|---|---|
| Anthropic | Research Engineer, Pretraining Scaling | AI Frontier | 10 | L1 |
| Anthropic | Research Engineer, Domain Scaling | AI Frontier | 9 | L0 |
| Writer | Staff AI research scientist | AI Frontier | 9 | L2 |
| Lila Sciences | Machine Learning Scientist I/II, Multi-Modal Scientific Reasonings | AI Frontier | 9 | L2 |
| Cohere | Senior Research Scientist, Model Evaluation | AI Frontier | 9 | L5 |
| Lila Sciences | Research Scientist, Frontier Capabilities | AI Frontier | 9 | L2 |
| Anthropic | Research Engineer, Pretraining Scaling - London | AI Frontier | 9 | L1 |
| Cohere | Member of Technical Staff, Agent Code | AI Frontier | 9 | L4 |
| Cohere | Senior Research Engineer, Model Evaluation | AI Frontier | 9 | L5 |
| Hume AI | AI Researcher | AI Frontier | 9 | L2 |
| Lila Sciences | Co-op, LLMs for Decision Making | AI Frontier | 8 | L4 |
| Mistral AI | Technical Program Manager, Science Operations, Code | AI Frontier | 8 | L2 |