Data Engineer

Cohere Cohere · AI Frontier · New York, NY · Agentic Platform

Cohere is seeking a Data Engineer to work on foundational infrastructure for AI systems, including storage, product launches, and customer experiences. The role involves collaborating with researchers and engineers, running implementations end-to-end, and partnering across departments to define growth strategies. The ideal candidate has 5+ years of experience in production-grade data processing systems, strong Python and SQL skills, and experience with distributed data processing frameworks.

What you'd actually do

  1. Work directly on storage infrastructure, product launches, and new customer experiences built on one of the most advanced AI systems in the world
  2. Collaborate daily with researchers and engineers who are some of the best in the world at what they do
  3. Run implementations end-to-end and see initiatives through to real outcomes — no waiting around to be told what to do
  4. Partner across research, marketing, sales, and finance to help define how Cohere grows, with your recommendations feeding directly into products and strategy

Skills

Required

  • 5+ years of experience working on production-grade data processing systems
  • Strong command of Python and SQL
  • Experience with distributed data processing frameworks such as Apache Beam, Spark, or Flink
  • The ability to transform unstructured data into performant datasets across diverse storage backends including S3, GCS, and POSIX

Nice to have

  • Experience with modern orchestration platforms, especially Kubernetes
  • Familiarity with modern analytics stack tooling such as BigQuery, Airflow, or dbt
  • Knowledge of Java or Golang
  • Genuine excitement about AI
  • Comfort operating at the edge of what's known, with a desire to build something genuinely new rather than optimise what already exists

What the JD emphasized

  • production-grade data processing systems
  • transform unstructured data into performant datasets

Other signals

  • building foundational infrastructure for AI
  • deploying frontier models
  • transforming unstructured data into performant datasets