Senior Research Engineer - Audio Post-training

Synthesia Synthesia · Multimodal · EUROPE · Research and Development

Research Engineer role focused on post-training optimization of generative speech and voice synthesis models to achieve production-level quality, speed, and robustness. Involves adapting models for new inputs, fine-tuning with advanced techniques (DPO, LoRA), implementing post-training optimizations (quantization, pruning, distillation), integrating novel architectures, and designing evaluation metrics for TTS systems.

What you'd actually do

  1. Adapt models for new conditioning inputs (emotion, speed, prosody, speaker control, etc.).
  2. Fine-tune and optimize speech models using advanced techniques such as DPO (Direct Preference Optimization), LoRA, and other parameter-efficient methods to improve voice quality and expressiveness.
  3. Implement post-training optimization techniques (quantization, pruning, distillation) to improve efficiency and latency in real-time speech generation.
  4. Integrate and test novel architectures, such as neural codecs, diffusion, or flow-matching models, to enhance realism and responsiveness.
  5. Design and implement new evaluation metrics for TTS systems, including automated Mean Opinion Score (MOS) prediction models for continuous quality assessment.

Skills

Required

  • generative modelling
  • large language models (LLMs)
  • transformer-based architectures
  • PyTorch
  • distributed training
  • model optimization
  • time-series modelling
  • tokenization
  • audio
  • speech
  • prototyping
  • hypothesis testing
  • deep learning models end-to-end
  • data preparation
  • evaluation
  • software engineering

Nice to have

  • diffusion models
  • neural codecs
  • flow-matching models
  • autoregressive decoders
  • speech-to-speech
  • text-to-speech (TTS) systems
  • publications

What the JD emphasized

  • production-level quality
  • real-time speech generation
  • production-level quality
  • real-time speech generation

Other signals

  • generative AI
  • synthetic voices
  • speech generation
  • model optimization
  • real-time generation