Cohere has 77 active AI-related job listings. The majority of these roles are focused on agents, representing 39% of the total. Engineering is the dominant function, with 60 positions. The company is actively hiring for roles related to model serving, agent orchestration, and fine-tuning. In the last 30 days, Cohere has posted 17 new AI roles, a significant increase compared to the previous 30-day period.
Currently tracking 69 active AI roles, down 44% versus the prior 4 weeks. Primary focus: Agent · Engineering.
Cohere currently has 83 active AI-related roles in our index. The most common open titles are: Forward Deployed Engineer, Agentic Platform (2), Solutions Architect - Public Sector (2), Applied AI Engineer - Agentic Workflows (Singapore), Applied AI Engineer – Agentic Workflows, Applied AI Engineer – Agentic Workflows (Korea). Most positions are in Engineering and Research.
Cohere's active AI hiring is concentrated in: agents (36%), data (20%), serving infrastructure (17%). These categories follow a seven-stage AI lifecycle: data, pre-training, post-training, serving infrastructure, agents, evaluation, and application.
Cohere is hiring AI talent in: Canada (37 roles), United States (15 roles), United Kingdom (15 roles), France (3 roles).
Job postings at Cohere most frequently reference: model serving, agent orchestration, fine tuning, rag, inference infra.
In the past 30 days, Cohere has posted 12 new AI-related roles. That is a -33% change versus the prior 30 days (18 → 12).
| Title | Stage | AI score |
|---|---|---|
| Staff Research Engineer, Model Efficiency Cohere is seeking a Staff Research Engineer focused on Model Efficiency to push the limits of LLM inference efficiency. This role involves exploring and shipping breakthroughs in model architecture, routing optimization, decoding algorithms, software/hardware co-design for GPU acceleration, and performance optimization without compromising model quality. The goal is to improve how fast and efficiently their foundation models run in production. | ServePretrain | 9 |
| Member of Technical Staff, Model Efficiency Cohere is seeking an engineer to improve LLM inference efficiency by optimizing model execution, reducing latency and increasing throughput. This role involves deep dives into model execution, identifying bottlenecks, and developing optimizations across the inference stack, including GPU/CUDA and kernel-level improvements. |
| Serve |
| 9 |
| Lead Member of Technical Staff, Inference Infrastructure Lead Member of Technical Staff, Inference Infrastructure at Cohere. Responsible for the design, deployment, and operation of the AI platform delivering large language models through API endpoints. Focuses on optimizing NLP models for low latency, high throughput, and high availability, with a strong emphasis on Kubernetes, GPU workloads, and multi-cloud environments. Requires extensive experience in production infrastructure, distributed systems, and technical leadership, including mentoring engineers and guiding strategic infrastructure decisions. | Serve | 8 |
| Staff Software Engineer, GPU Infrastructure (HPC) Staff Software Engineer focused on building and scaling ML-optimized HPC infrastructure, including Kubernetes-based GPU/TPU superclusters, optimizing for AI/ML training cost efficiency, reliability, and performance, and enabling researchers with self-service tools. | Serve | 8 |
| Site Reliability Engineer, Inference Infrastructure Cohere is seeking a Site Reliability Engineer to join their Model Serving team. This role focuses on building, deploying, and operating the AI platform that delivers Cohere's large language models through API endpoints. The engineer will work on high-performance, scalable, and reliable machine learning systems, ensuring low latency, high throughput, and high availability for NLP model deployments. Responsibilities include automating service management, environment observability, and resilience, while collaborating with internal developers and influencing the infrastructure roadmap. | Serve | 8 |
| Staff Software Engineer, Inference Infrastructure Cohere is seeking a Staff Software Engineer to join their Model Serving team. This role focuses on developing, deploying, and operating the AI platform that delivers Cohere's large language models via API endpoints. The engineer will optimize NLP models for low latency, high throughput, and high availability, working with distributed systems, Kubernetes, and GPU workloads. Experience with cloud platforms and high-performance languages is required. | Serve | 8 |
| Audio Inference Engineer, Model Efficiency Cohere is seeking an Audio Inference Engineer to optimize audio inference serving efficiency, focusing on latency, throughput, and quality for real-time and streaming audio workloads. The role involves deep system analysis, bottleneck identification, and developing creative solutions for audio processing and inference. | ServePost-train | 8 |
| Software Engineer, Internal Infrastructure (North America) Software Engineer focused on building and operating internal infrastructure for training, evaluating, and serving foundational AI models. This includes managing Kubernetes GPU superclusters, optimizing cloud infrastructure for AI workloads, and designing scalable systems for model training, with a strong emphasis on stability, scalability, and observability. | ServeData | 8 |
| Senior Search Applications Performance Engineer Cohere is seeking a Senior Search Applications Performance Engineer to optimize and scale their AI-powered search services, focusing on performance, latency, and integration with new tool surfaces for agentic users. The role involves implementing monitoring, developing benchmarking frameworks, collaborating with modeling teams, and ensuring high availability and low latency for search services. | ServeAgent | 7 |
| Engineering Manager, Agentic Platform Engineering Manager to lead a team of Forward Deployed Engineers responsible for deploying Cohere's AI platform (North) into customer environments, focusing on private cloud and on-premises deployments, technical implementation, performance optimization, and scaling infrastructure. | Serve | 7 |
| Software Engineer Intern (Fall / Winter 2026) Cohere is seeking Software Engineer Interns to help build and deploy AI systems, focusing on areas like content generation, semantic search, RAG, and agents. Interns will contribute to machine learning datasets, API serving infrastructure, security features, and internal tooling, with opportunities to ship code to production. | Serve | 7 |