Senior / Staff+ Software Engineer, Voice Platform

Anthropic Anthropic · AI Frontier · San Francisco, CA · Software Engineering - Infrastructure

Senior/Staff+ Software Engineer for Anthropic's Voice Platform, focusing on building and operating real-time streaming infrastructure, low-latency serving systems for speech models, and APIs for voice conversations with Claude. The role involves optimizing performance, ensuring reliability, and collaborating with research and product teams to bring audio models from research to production.

What you'd actually do

  1. Design and build the real-time streaming infrastructure that powers voice conversations with Claude—ingesting microphone audio, orchestrating model inference, and streaming synthesized speech back with minimal latency
  2. Build low-latency serving systems for speech models, optimizing time-to-first-audio and end-to-end conversational responsiveness
  3. Develop the public and internal APIs that expose voice capabilities to Claude.ai, mobile clients, and third-party developers
  4. Own the audio transport layer—codecs, jitter buffers, adaptive bitrate, packet loss recovery—so conversations stay smooth across unreliable networks
  5. Build observability and quality-measurement systems for voice: latency distributions, audio quality metrics, interruption handling, and turn-taking accuracy

Skills

Required

  • 6+ years of experience building distributed systems, real-time infrastructure, or platform services at scale
  • shipped production systems where latency is measured in tens of milliseconds and users notice when you miss
  • comfortable working across the stack—from transport protocols and serving infrastructure up to the APIs product teams build on
  • results-oriented, with a bias toward flexibility and impact
  • pair programming

Nice to have

  • Real-time media protocols and stacks: WebRTC, RTP, gRPC bidirectional streaming, or WebSockets at scale
  • Audio engineering fundamentals: codecs (Opus, AAC), voice activity detection, echo cancellation, jitter buffering, or audio DSP
  • Low-latency ML inference serving, streaming model outputs, or GPU-based serving infrastructure
  • Telephony, live streaming, video conferencing, or voice assistant platforms
  • Mobile audio pipelines on iOS (AVAudioEngine, AudioUnits) or Android (Oboe, AAudio)
  • Working alongside ML researchers to productionize models—speech experience is a plus but not required

What the JD emphasized

  • shipped production systems where latency is measured in tens of milliseconds and users notice when you miss

Other signals

  • shipping production systems
  • low-latency inference
  • real-time streaming infrastructure
  • voice models