Senior Platform Engineer, Voice AI

Together AI Together AI · Data AI · San Francisco, CA · Engineering

Senior Platform Engineer for Together AI's Voice AI platform, focusing on the API and infrastructure layer for real-time speech-to-text and text-to-speech models. The role involves building WebSocket and HTTP APIs, designing autoscaling for latency-sensitive streaming, and ensuring platform reliability for production voice agents.

What you'd actually do

  1. Build and harden real-time WebSocket and HTTP streaming APIs for STT and TTS — including connection lifecycle management, backpressure, error handling, and reconnection, at the reliability bar needed for production voice agents.
  2. Design and ship autoscaling for voice model endpoints that handles bursty, real-time traffic patterns — accounting for concurrent connection limits, streaming state, and hard latency ceilings.
  3. Implement voice-specific API features: word-level alignment, speaker diarization in realtime, audio format flexibility (g711/mulaw for telephony, PCM, WebRTC formats), pronunciation controls, and multi-context WebSocket support.
  4. Build voice-specific observability — latency breakdowns, audio quality signals, and dashboards that help both the team and customers debug issues.
  5. Own multi-model normalization across our model partners (Cartesia, Deepgram, Rime, and others), ensuring consistent API behavior regardless of the underlying provider.

Skills

Required

  • 5+ years of experience building large-scale, real-time distributed systems and API services
  • Deep expertise in real-time streaming infrastructure — WebSocket server architecture, Server-Sent Events, bidirectional streaming, connection multiplexing, and stateful protocol design
  • Expert-level programming in TypeScript and Python
  • Strong distributed systems fundamentals: load balancing, autoscaling, rate limiting, and traffic shaping for latency-sensitive workloads
  • Experience with Kubernetes — including custom autoscalers, resource management, and health checking for stateful services
  • Strong product sense — you care about API ergonomics and think about what developers building voice apps actually need
  • Comfort working on a small, early-stage team where you'll wear multiple hats and move fast

Nice to have

  • experience with Rust
  • Experience with audio or media protocols (WebRTC, g711, PCM encoding)
  • Familiarity with ML model serving infrastructure and how inference engines work
  • Full-stack experience (React, Next.js)

What the JD emphasized

  • real-time
  • latency-sensitive
  • production voice agents
  • real-time streaming infrastructure
  • latency ceilings

Other signals

  • building real-time API layer for voice workloads
  • designing autoscaling for latency-sensitive streaming workloads
  • ensuring multi-provider voice platform reliability
  • defining developer interaction with voice platform