Model Behavior Architect

Mistral AI Mistral AI · AI Frontier · Paris, France · Research

This role focuses on defining and measuring LLM behavior, designing and implementing evaluation pipelines, data guidelines, and synthetic testing environments to identify and fix edge cases. It involves interacting with models, gathering feedback, and collaborating with AI Scientists to improve reasoning, audio, alignment, tools, and frontier bets.

What you'd actually do

  1. Interact with models to identify where model behavior can be improved
  2. Gather internal and external feedback on model behavior to scope areas for improvement
  3. Design and implement evals, data guidelines, data generation, and synthetic testing environments
  4. Identify and fix edge case behaviors through rigorous testing
  5. Develop robust evaluation pipelines for our model candidates

Skills

Required

  • model evaluation
  • policy writing
  • building evaluation pipelines
  • linguistics
  • language
  • translation
  • engineering
  • code behavior
  • LLM agents
  • reasoning
  • tool use
  • training model behavior
  • optimizing model behavior

Nice to have

  • synthetic testing environments

What the JD emphasized

  • experts in model evaluation
  • creating eval pipelines
  • building robust evaluations
  • rigorous testing
  • model behavior

Other signals

  • model behavior
  • eval pipelines
  • testing environments
  • edge case behaviors
  • model candidates