Senior Research Engineer - Safety Tooling and Data

Cohere Cohere · AI Frontier · New York, NY · Modeling

Senior Research Engineer focused on building tools for data synthesis, analysis, and management to improve AI model safety and reliability. This role involves creating data infrastructure, establishing validation processes, and developing analysis frameworks to support model experimentation and evaluation.

What you'd actually do

  1. Design and implement robust data pipeline tooling that enables frequent, low-friction data generation and annotation
  2. Create cohesive data infrastructure that supports continuous parallel operation with model experimentation
  3. Establish standardized processes for data validation, analysis, and improvement of data both training and evaluation
  4. Collaborate with the ML modeling team to align data capabilities with experimental needs
  5. Maintain opinionated, well-documented solutions that become team standards

Skills

Required

  • Python
  • ML frameworks (PyTorch, TensorFlow, JAX)
  • Software engineering
  • Statistics
  • Data science
  • Data pipeline tooling
  • Data infrastructure
  • Data validation
  • Analysis frameworks

Nice to have

  • Interest in AI safety and trust

What the JD emphasized

  • building tools to enable easy data synthesis, analysis, and management
  • complex combinations of real and synthetic data
  • tooling repositories
  • tighter experimentation cycles
  • better data coverage of the real world
  • more scientific rigour
  • high level experimental problems
  • efficient pipelines
  • data analysis will collaboratively feed into modelling decisions and experimentation
  • software engineering, statistics, and data science
  • Modelling Safety and Trust team
  • Extremely strong software engineering skills
  • Strong statistical skills and experience evaluating scientific experiments related to data collection and model performance
  • ML data requirements and the intersection of data engineering with modeling workflows

Other signals

  • building tools for data synthesis, analysis, and management
  • owning the cohesive vision of tooling repositories
  • creating tooling that enables tighter experimentation cycles, better data coverage, and more scientific rigor
  • designing and implementing solutions for high-level experimental problems with efficient pipelines
  • developing systematic analysis frameworks to identify incoming data sources and benchmark coverage gaps