Staff Backend Engineer, Voices

Synthesia Synthesia · Multimodal · EUROPE · Engineering

Staff Backend Engineer focused on the core speech and voice generation experience for Synthesia's AI video platform. The role involves designing and delivering features across the script preview and voice orchestration stack, integrating with TTS providers, building recommendation systems, and ensuring voice output quality. Responsibilities include backend systems for TTS provider orchestration, frontend experiences for voice selection, and voice discovery/recommendation systems. The role emphasizes working on 0 to 1 problems, collaborating with product and AI teams, and shipping features end-to-end.

What you'd actually do

  1. Design and deliver features across the script preview and voice orchestration stack, combining frontend user experiences with backend platform reliability.
  2. Integrate with multiple Text-to-Speech (TTS) providers, building recommendation systems, and ensuring consistency and quality across all voice outputs.
  3. Take ownership of features from idea through to production, working with loosely defined requirements to scope, prototype, and ship solutions that deliver real user impact.
  4. Build backend systems for TTS provider orchestration, handling fallbacks, retries, and load-shedding across multiple providers.
  5. Build voice discovery and recommendation systems that guide users to high-quality voices and help them iterate quickly.

Skills

Required

  • Backend systems development
  • API integration
  • System design
  • Problem-solving
  • Collaboration with product and design teams
  • Working in 0 to 1 environments
  • Shipping features end-to-end

Nice to have

  • Audio/speech systems experience
  • TTS experience
  • API orchestration
  • Provider integrations
  • Quality evaluation frameworks
  • Frontend development

What the JD emphasized

  • core speech and voice generation experience
  • critical path of script creation and video generation
  • voice orchestration stack
  • Text-to-Speech (TTS) providers
  • voice outputs
  • 0 to 1 problems
  • voice quality frameworks
  • new TTS capabilities
  • voice quality data
  • shipping product features end-to-end
  • ambiguous problems
  • evolving requirements
  • audio/speech systems
  • TTS
  • API orchestration
  • provider integrations
  • quality evaluation frameworks

Other signals

  • AI video platform
  • speech and voice generation
  • Text-to-Speech (TTS) providers
  • voice quality frameworks
  • recommendation systems