Audio Inference Engineer, Model Efficiency

Cohere Cohere · AI Frontier · New York, NY · Modeling

Cohere is seeking an Audio Inference Engineer to optimize audio inference serving efficiency, focusing on latency, throughput, and quality for real-time and streaming audio workloads. The role involves deep system analysis, bottleneck identification, and developing creative solutions for audio processing and inference.

What you'd actually do

  1. build reliable machine learning systems and optimize audio inference serving efficiency using innovative techniques
  2. work on advancing core audio model serving metrics, including latency, throughput, and quality by diving deep into our systems, identifying bottlenecks, and delivering creative solutions for audio processing and streaming workloads
  3. collaborate closely with both the training and serving infrastructure teams to ensure seamless integration between model development and deployment, with a special focus on real-time and streaming audio inference

Skills

Required

  • C++
  • Python
  • high-performance audio or machine learning inference systems
  • deep learning models for audio, speech, or language applications

Nice to have

  • GPU programming
  • low-level system optimization
  • model parallelization techniques over multiple GPUs
  • duplex real-time streaming architectures
  • internals of machine learning frameworks for audio (such as PyTorch, TensorFlow, or specialized audio libraries)
  • inference framework like vLLM, SGLang, Tensort-LLM, or custom distributed inference systems
  • sequence modeling (e.g., transformers for audio/speech)
  • end-to-end audio pipeline optimization

What the JD emphasized

  • high-performance audio or machine learning inference systems
  • deep learning models for audio, speech, or language applications
  • real-time streaming architectures
  • inference framework like vLLM, SGLang, Tensort-LLM, or custom distributed inference systems
  • sequence modeling (e.g., transformers for audio/speech) and end-to-end audio pipeline optimization

Other signals

  • optimize audio inference serving efficiency
  • advancing core audio model serving metrics
  • real-time and streaming audio inference