Senior Research Engineer, Model Evaluation

Cohere Cohere · AI Frontier · Toronto, ON · Modeling

This role focuses on developing and implementing next-generation evaluation methods and scalable infrastructure for measuring LLM progress. The Senior Research Engineer will create evaluation benchmarks, datasets, and environments, conduct research to advance LLM evaluation techniques, and build tools for analyzing evaluation results. The role requires a strong background in LLM evaluation, research, and software engineering.

What you'd actually do

  1. Develop evaluation benchmarks, datasets, and environments for measuring the bleeding edge of model capabilities
  2. Conduct research to push the state-of-the-art in LLM evaluation methods, including training LLM judges; improving evaluation efficiency; and scalably building high-quality datasets
  3. Build scalable tools for investigating and understanding evaluation results that are used by all members of technical staff at Cohere, as well as leadership and our CEO
  4. Learn from and work with the best researchers and engineers in the field

Skills

Required

  • Experience developing evaluation benchmarks, datasets, and environments for LLMs
  • Research experience in LLM evaluation methods
  • Experience building scalable tools for analyzing LLM performance
  • Strong software engineering skills
  • Experience building with and around LLMs

Nice to have

  • Publications at top-tier conferences
  • Experience with popular benchmarks

What the JD emphasized

  • creating next-generation evaluation methods
  • scalable infrastructure to measure LLM progress
  • Develop evaluation benchmarks, datasets, and environments
  • push the state-of-the-art in LLM evaluation methods
  • built high-quality evaluation resources
  • track record of developing new methods and/or data to evaluate LLMs

Other signals

  • developing next-generation evaluation methods
  • scalable infrastructure to measure LLM progress
  • creating evaluation benchmarks, datasets, and environments