(senior) Staff Research Scientist | Voice

DeepL DeepL · AI Frontier · Munich, Germany · Research

Senior Staff Research Scientist at DeepL focusing on voice translation. The role involves leading research and development in ASR, MT, TTS, and speech-to-speech translation, with a focus on real-time, low-latency streaming systems. Responsibilities include designing, training, and optimizing large-scale models, improving cascaded translation pipelines, developing TTS models, building end-to-end systems, and owning the full lifecycle from prototyping to production deployment. The role also requires close collaboration with engineering, driving inference efficiency, and establishing best practices for evaluation and monitoring.

What you'd actually do

  1. Lead hands-on research and development across ASR, MT, TTS, and speech-to-speech translation for real-time voice products.
  2. Design, train, and optimize large-scale ASR models for multilingual accuracy, robustness, and ultra-low-latency streaming.
  3. Improve cascaded translation pipelines end to end: segmentation, ASR→MT interfaces, streaming MT inference, and incremental decoding.
  4. Develop and refine real-time TTS models with natural prosody, stable speaker characteristics, and fast inference.
  5. Build and experiment with end-to-end and LLM-based speech-to-speech translation systems, including streaming and one-shot approaches.

Skills

Required

  • Deep expertise in speech, audio, or multilingual ML, particularly in ASR, MT, TTS, end-to-end ST, or large speech models.
  • A hands-on builder who enjoys training models, running experiments, debugging pipelines, and integrating ML systems into production.
  • Strong understanding of real-time streaming constraints and how to design models that operate reliably at low latency.
  • Experience shipping ML models to production, maintaining them at scale, and working with engineers on deployment, monitoring, and serving.
  • Ability to lead complex research efforts while staying grounded in product impact, user experience, and real-world performance.
  • Strong coding and experimentation skills (Python, PyTorch/JAX, audio processing libraries).
  • Ability to communicate clearly, collaborate across teams, and align research work with product and engineering priorities.
  • Proven experience mentoring others and elevating technical quality across a fast-moving, applied research team.

What the JD emphasized

  • ultra-low-latency streaming
  • real-time TTS models
  • fast inference
  • end-to-end and LLM-based speech-to-speech translation systems
  • streaming and one-shot approaches
  • real-time systems
  • reliability, uptime, and quality at scale
  • inference efficiency
  • model serving
  • voice UX
  • robustness to real-world acoustic conditions
  • evaluation, reproducibility, monitoring, and continuous model improvement in production
  • real-time streaming constraints
  • low latency
  • shipping ML models to production
  • maintaining them at scale
  • deployment, monitoring, and serving

Other signals

  • leading scientific innovation
  • define long-term scientific strategy
  • prototype rapidly
  • run large-scale experiments
  • drive breakthroughs all the way into production
  • work across ASR, MT, TTS, streaming inference, and large speech models
  • leading both cascaded and emerging end-to-end internationalized speech-translation approaches