What you'd actually do

Design and develop cutting-edge multimodal AI systems, integrating various modalities such as text, speech, and vision.

Conduct research and experiments on our advanced compute infrastructure, exploring novel ideas in multimodal representation learning, transfer learning, and more.

Collaborate closely with our world-class teams, learning from and contributing to their expertise in the field.

What the JD emphasized

exceptional software engineering skills

proven track record of building robust and scalable systems

strong command of Python

well-versed in popular deep learning frameworks like JAX, PyTorch, and TensorFlow

knowledge of distributed training strategies, especially for large-scale multimodal models

familiarity with autoregressive models, particularly their application in multimodal tasks such as image or video captioning, speech-to-text generation

tuning and optimising large multimodal models

experience building evaluations to measure their performance

comfortable diving into complex ML codebases to identify and resolve issues

thrive in a fast-paced, technically challenging environment

history of delivering creative, practical solutions to real-world problems

Who are we?

Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.

We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers.

Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.

Join us on our mission and shape the future!

Why this role?

At Cohere, we believe in the power of multimodal AI to revolutionise the way we interact with technology. Our engineering teams push the boundaries of what's possible, and we're looking for talented individuals to join us on this exciting journey. With an exceptional ratio of compute resources to engineers, we provide an ideal environment for you to explore, innovate and shape the future of AI.

July 31st 2025 - Cohere's Multimodal team Introduced Command A Vision: Multimodal AI Built for Business. At release our new flagship vision-language model: ● Consistently outperforms major models like Llama 4 Maverick, Mistral Medium/Pixtral Large, and GPT4.1 ● 83.1% average benchmark (73.5% MathVista, 90.9% ChartQA...) ● Built for the real world - 112B parameters running on just 2 GPUs ● Open weights live on HuggingFace

With a focused team, breakthrough performance doesn't require breakthrough compute. Focus on the things that matter, and join the team.

As a Member of Technical Staff with a focus on Multimodal AI, you will:

Design and develop cutting-edge multimodal AI systems, integrating various modalities such as text, speech, and vision.
Conduct research and experiments on our advanced compute infrastructure, exploring novel ideas in multimodal representation learning, transfer learning, and more.
Collaborate closely with our world-class teams, learning from and contributing to their expertise in the field.

You are an ideal candidate if you:

Possess exceptional software engineering skills, with a proven track record of building robust and scalable systems.
Have a strong command of Python and are well-versed in popular deep learning frameworks like JAX, PyTorch, and TensorFlow, with an understanding of their multimodal capabilities.
Knowledge of distributed training strategies, especially for large-scale multimodal models.
Familiarity with autoregressive models, particularly their application in multimodal tasks such as image or video captioning, speech-to-text generation.
Bonus: Publications in top-tier venues demonstrating your expertise in multimodal AI research.
Bonus: Experience in writing efficient GPU kernels using CUDA, optimising performance for multimodal tasks.

This role is perfect for you if you:

Have a deep passion for machine learning and its potential to impact various industries through multimodal applications.
Enjoy tuning and optimising large multimodal models, and have experience building evaluations to measure their performance.
Are comfortable diving into complex ML codebases to identify and resolve issues, ensuring the smooth operation of our systems.
Thrive in a fast-paced, technically challenging environment, where you can contribute your innovative ideas and solutions.
Have a history of delivering creative, practical solutions to real-world problems, demonstrating your ability to think outside the box.

If you're excited about the potential of multimodal AI and want to be at the forefront of this rapidly evolving field, we want to hear from you! Join us at Cohere and be a part of a diverse, remote-friendly team that's changing the world of AI.

Please Note: We have offices in Toronto, London, Paris, San Francisco, and New York, but we welcome applications from anywhere in the world. Our team is spread across the globe, and we embrace the benefits of a remote-friendly culture.

If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply!

We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs.

Full-Time Employees at Cohere enjoy these Perks:

🤝 An open and inclusive culture and work environment

🧑‍💻 Work closely with a team on the cutting edge of AI research

🍽 Weekly lunch stipend, in-office lunches & snacks

🦷 Full health and dental benefits, including a separate budget to take care of your mental health

🐣 100% Parental Leave top-up for up to 6 months

🎨 Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement

🏙 Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend

✈️ 6 weeks of vacation (30 working days!)