Cohere
ScalingAI Frontier · Enterprise LLMs
Currently tracking 60 active AI roles, up 80% versus the prior 4 weeks. Primary focus: Agent · Engineering.
Hiring
60 / 60
Momentum (4w)
↑+16 +80%
36 opens last 4w · 20 prior 4w
Salary range
—
Tracked since
Oct '24
last role today
Hiring velocityscroll left for older weeks
Jobs (5)
| Title | Stage | AI score |
|---|---|---|
| Staff Research Engineer, Model Efficiency Cohere is seeking a Staff Research Engineer focused on Model Efficiency to push the limits of LLM inference efficiency. This role involves exploring and shipping breakthroughs in model architecture, routing optimization, decoding algorithms, software/hardware co-design for GPU acceleration, and performance optimization without compromising model quality. The goal is to improve how fast and efficiently their foundation models run in production. | ServePretrain | 9 |
| Member of Technical Staff, Model Efficiency Cohere is seeking an engineer to improve LLM inference efficiency by optimizing model execution, reducing latency and increasing throughput. This role involves deep dives into model execution, identifying bottlenecks, and developing optimizations across the inference stack, including GPU/CUDA and kernel-level improvements. | Serve | 9 |
| Lead Member of Technical Staff, Inference Infrastructure Lead Member of Technical Staff, Inference Infrastructure at Cohere. Responsible for the design, deployment, and operation of the AI platform delivering large language models through API endpoints. Focuses on optimizing NLP models for low latency, high throughput, and high availability, with a strong emphasis on Kubernetes, GPU workloads, and multi-cloud environments. Requires extensive experience in production infrastructure, distributed systems, and technical leadership, including mentoring engineers and guiding strategic infrastructure decisions. | Serve | 8 |
| Staff Software Engineer, Inference Infrastructure Cohere is seeking a Staff Software Engineer to join their Model Serving team. This role focuses on developing, deploying, and operating the AI platform that delivers Cohere's large language models via API endpoints. The engineer will optimize NLP models for low latency, high throughput, and high availability, working with distributed systems, Kubernetes, and GPU workloads. Experience with cloud platforms and high-performance languages is required. | Serve | 8 |
| Audio Inference Engineer, Model Efficiency Cohere is seeking an Audio Inference Engineer to optimize audio inference serving efficiency, focusing on latency, throughput, and quality for real-time and streaming audio workloads. The role involves deep system analysis, bottleneck identification, and developing creative solutions for audio processing and inference. | ServePost-train | 8 |