Senior Research Scientist, Model Evaluation

Cohere Cohere · AI Frontier · Toronto, ON · Modeling

Cohere is seeking a Senior Research Scientist focused on Model Evaluation to create next-generation evaluation methods and infrastructure for LLMs. This role involves developing new benchmarks, advancing state-of-the-art evaluation techniques, and building scalable tools to measure LLM progress and capabilities.

What you'd actually do

  1. Create ambitious new evaluation benchmarks that push the limits of what our models can accomplish.
  2. Work on highly cross-functional teams to translate model feedback into trustworthy, repeatable evaluations.
  3. Conduct research to advance the state-of-the-art in LLM evaluation methods, including training LLM judges; refining LLM-based data synthesis pipelines; and improving evaluation efficiency.
  4. Build scalable and reusable tools for digging into model performance.

Skills

Required

  • strong software engineering skills

Nice to have

  • creating next-generation evaluation methods
  • training LLM judges
  • LLM-based data synthesis pipelines
  • improving evaluation efficiency
  • building scalable and reusable tools for digging into model performance

What the JD emphasized

  • rigorously measuring AI capabilities
  • measurements actually align with the capabilities you care about

Other signals

  • creating next-generation evaluation methods
  • measure LLM progress
  • advance the state-of-the-art in LLM evaluation methods