Cohere

Scaling

AI Frontier · Enterprise LLMs

Currently tracking 60 active AI roles, up 80% versus the prior 4 weeks. Primary focus: Agent · Engineering.

Hiring
60 / 60
Momentum (4w)
+16 +80%
36 opens last 4w · 20 prior 4w
Salary range
Tracked since
Oct '24
last role today
Hiring velocityscroll left for older weeks
3 new roles
Oct 28
1 new role
Dec 2
1 new role
Jan 27
2 new roles
Feb 17
1 new role
May 19
1 new role
Jun 9
2 new roles
23
2 new roles
Jul 7
1 new role
Aug 18
1 new role
25
1 new role
Sep 8
1 new role
15
1 new role
22
1 new role
29
2 new roles
Oct 6
3 new roles
20
2 new roles
27
5 new roles
Nov 3
2 new roles
10
2 new roles
24
1 new role
Dec 1
2 new roles
8
1 new role
15
3 new roles
Jan 5
9 new roles
12
5 new roles
26
1 new role
Feb 2
2 new roles
16
2 new roles
23
5 new roles
Mar 2
6 new roles
9
7 new roles
16
5 new roles
23
4 new roles
30
4 new roles
Apr 6
7 new roles
13
6 new roles
20
11 new roles
27
12 new roles
May 4

Jobs (14)

69 AI · 128 total active
FilteredStageServe×
TitleStageFunctionLocationFirst seenAI score
Staff Research Engineer, Model Efficiency
Cohere is seeking a Staff Research Engineer focused on Model Efficiency to push the limits of LLM inference efficiency. This role involves exploring and shipping breakthroughs in model architecture, routing optimization, decoding algorithms, software/hardware co-design for GPU acceleration, and performance optimization without compromising model quality. The goal is to improve how fast and efficiently their foundation models run in production.
ServePretrainEngineeringNew York, NYNov '259
Member of Technical Staff, Model Efficiency
Cohere is seeking an engineer to improve LLM inference efficiency by optimizing model execution, reducing latency and increasing throughput. This role involves deep dives into model execution, identifying bottlenecks, and developing optimizations across the inference stack, including GPU/CUDA and kernel-level improvements.
ServeEngineeringNew York, NYNov '259
Member of Technical Staff, Modeling
Cohere is seeking a Member of Technical Staff, Modeling to design, build, and scale AI systems for serving users. This role involves researching, implementing, and experimenting with ideas on supercompute and data infrastructure, with a strong emphasis on both research and production code. The position requires strong software engineering skills, proficiency in Python and ML frameworks, experience with large-scale distributed training, and GPU programming.
ServePretrainResearchLondon, United KingdomNov '249
Lead Member of Technical Staff, Inference Infrastructure
Lead Member of Technical Staff, Inference Infrastructure at Cohere. Responsible for the design, deployment, and operation of the AI platform delivering large language models through API endpoints. Focuses on optimizing NLP models for low latency, high throughput, and high availability, with a strong emphasis on Kubernetes, GPU workloads, and multi-cloud environments. Requires extensive experience in production infrastructure, distributed systems, and technical leadership, including mentoring engineers and guiding strategic infrastructure decisions.
ServeEngineeringSan Francisco, CA2w ago8
Staff Software Engineer, GPU Infrastructure (HPC)
Staff Software Engineer focused on building and scaling ML-optimized HPC infrastructure, including Kubernetes-based GPU/TPU superclusters, optimizing for AI/ML training cost efficiency, reliability, and performance, and enabling researchers with self-service tools.
ServeEngineeringCanadaJan 158
Site Reliability Engineer, Inference Infrastructure
Cohere is seeking a Site Reliability Engineer to join their Model Serving team. This role focuses on building, deploying, and operating the AI platform that delivers Cohere's large language models through API endpoints. The engineer will work on high-performance, scalable, and reliable machine learning systems, ensuring low latency, high throughput, and high availability for NLP model deployments. Responsibilities include automating service management, environment observability, and resilience, while collaborating with internal developers and influencing the infrastructure roadmap.
ServeEngineeringToronto, ONJan 128
Staff Software Engineer, Inference Infrastructure
Cohere is seeking a Staff Software Engineer to join their Model Serving team. This role focuses on developing, deploying, and operating the AI platform that delivers Cohere's large language models via API endpoints. The engineer will optimize NLP models for low latency, high throughput, and high availability, working with distributed systems, Kubernetes, and GPU workloads. Experience with cloud platforms and high-performance languages is required.
ServeEngineeringSan Francisco, CAJan 128
Audio Inference Engineer, Model Efficiency
Cohere is seeking an Audio Inference Engineer to optimize audio inference serving efficiency, focusing on latency, throughput, and quality for real-time and streaming audio workloads. The role involves deep system analysis, bottleneck identification, and developing creative solutions for audio processing and inference.
ServePost-trainEngineeringNew York, NYNov '258
Software Engineer, Internal Infrastructure (North America)
Software Engineer focused on building and operating internal infrastructure for training, evaluating, and serving foundational AI models. This includes managing Kubernetes GPU superclusters, optimizing cloud infrastructure for AI workloads, and designing scalable systems for model training, with a strong emphasis on stability, scalability, and observability.
ServeDataEngineeringToronto, ONOct '258
Engineering Manager, Agentic Platform
Engineering Manager to lead a team of Forward Deployed Engineers responsible for deploying Cohere's AI platform (North) into customer environments, focusing on private cloud and on-premises deployments, technical implementation, performance optimization, and scaling infrastructure.
ServeEngineeringEUROPE6d ago7
Software Engineer Intern (Fall / Winter 2026)
Cohere is seeking Software Engineer Interns to help build and deploy AI systems, focusing on areas like content generation, semantic search, RAG, and agents. Interns will contribute to machine learning datasets, API serving infrastructure, security features, and internal tooling, with opportunities to ship code to production.
ServeEngineeringCanada1w ago7
Product Manager, Platform Experience & Developer Product
Product Manager for Cohere's Platform Experience and Developer Product, focusing on how developers and enterprise technical teams build on, integrate with, and operate Cohere's model platform. This includes managed services, APIs, SDKs, and developer tooling.
ServeAgentProductToronto, ON8w ago7
Engineering Manager, FDE Infrastructure (EMEA)
Engineering Manager to lead a Forward Deployed Engineers team in EMEA, responsible for the end-to-end deployment of Cohere's North platform into customer environments (private cloud, on-premises). The role involves mentoring engineers, collaborating with Product/Engineering/Sales, optimizing infrastructure (OpenSearch, K8s services), and defining scaling guidelines for compute resources. Requires strong software engineering and leadership experience, with hands-on deployment of enterprise software at scale, cloud infrastructure, and Kubernetes.
ServeEngineeringUnited Arab Emirates3w ago5
Full-Stack Software Engineer, Inference
This role focuses on the full-stack engineering of Cohere's inference platform, specifically enhancing customer-facing systems like authentication, billing, payments, and the interactive Playground. The engineer will also work on deployment management features and ensure code runs efficiently in low-resource environments with stringent security and privacy controls.
ServeEngineeringToronto, ONJan 125