Member of Technical Staff, Data Analysis and Evaluation

Cohere Cohere · AI Frontier · London, United Kingdom · Modeling

Cohere is seeking a Member of Technical Staff, Data Analysis and Evaluation, to ensure the quality, reliability, and performance of their LLMs. The role involves designing data collection tasks, evaluating dataset quality, analyzing model robustness and generalizability, and collaborating with cross-functional teams. Responsibilities include overseeing data collection, developing statistical methods for dataset evaluation, analyzing ML system generalizability, improving dataset quality and model performance, training LLMs, and conducting experiments.

What you'd actually do

  1. Design and oversee data collection tasks, including supporting human annotators and ensuring data quality.
  2. Develop and apply statistical methods to evaluate the quality and reliability of datasets.
  3. Analyse and assess the generalisability and robustness of ML systems across diverse use cases.
  4. Collaborate with teams to improve dataset quality and model performance.
  5. Train and fine-tune large language models (LLMs) on distributed training infrastructures.

Skills

Required

  • Python
  • PyTorch
  • TensorFlow
  • JAX
  • statistical methods
  • experimental design
  • data analysis
  • ML frameworks

Nice to have

  • human annotators
  • model robustness
  • generalizability
  • distributed training infrastructures
  • top-tier publication record

What the JD emphasized

  • Extremely strong software engineering skills.
  • Strong expertise in designing and conducting data collection tasks, including working with human annotators.
  • Strong statistical skills and experience evaluating scientific experiments related to data collection and model performance.
  • Experience analysing datasets with respect to their quality, biases, and suitability for training ML models.
  • Hands-on experience training large language models (LLMs) on distributed training infrastructures.
  • Familiarity with evaluating and improving the generalisability and robustness of ML systems.
  • One or more papers at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP).

Other signals

  • data quality
  • dataset evaluation
  • model robustness
  • generalizability
  • training data