Currently tracking 66 active AI roles, up 53% versus the prior 4 weeks. Primary focus: Agent · Engineering.
| Title | Stage | AI score |
|---|---|---|
| Model Behavior Architect This role focuses on defining and measuring LLM behavior, designing and implementing evaluation pipelines, data guidelines, and synthetic testing environments to identify and fix edge cases. It involves interacting with models, gathering feedback, and collaborating with AI Scientists to improve reasoning, audio, alignment, tools, and frontier bets. | Eval GatePost-train | 9 |
| Applied AI, Evaluation Engineer This role focuses on designing and implementing evaluation systems and infrastructure for LLMs, specifically for enterprise clients. The goal is to measure model performance across customer-specific use cases, moving beyond general benchmarks to domain-specific, risk-aware evaluations. The role involves building scalable pipelines, developing new methodologies, and tailoring evaluations to customer needs, bridging research, engineering, and customer-facing teams. |
| Eval GatePost-train |
| 9 |