Senior Research Engineer - Interactive Avatars

Synthesia Synthesia · Multimodal · EUROPE · Research and Development

Research Engineer focused on avatar-centric interactive video diffusion models, aiming to turn breakthrough ideas into real product capabilities. Responsibilities include adapting diffusion models, developing real-time streaming methods, working on the perceptual layer for user interaction, improving visual quality, and building evaluation frameworks. Requires strong ML/CV background, PyTorch proficiency, and Python engineering skills.

What you'd actually do

  1. Adapt diffusion models to incorporate diverse conditioning signals (e.g., audio, motion, interaction cues).
  2. Develop methods for streaming infinitely long video sequences at real-time rates.
  3. Work on the perceptual layer of interactive agents, including understanding user audio and generating appropriate contextual reactions.
  4. Improve lip-sync accuracy, motion realism, and overall visual quality in video diffusion models.
  5. Build robust evaluation frameworks and test suites to enable continuous quality tracking.

Skills

Required

  • ML (e.g., diffusion, GANs, VAEs)
  • computer vision
  • diffusion models
  • PyTorch
  • modern ML frameworks and tooling
  • Python engineering
  • git and version control
  • clean, maintainable research code

Nice to have

  • audio-conditioned video diffusion models
  • video DiT architectures
  • full model development pipeline end to end
  • publication record in areas such as world models, interactive agents, or video diffusion models

What the JD emphasized

  • avatar-centric interactive video diffusion models
  • real-time rates
  • user audio and generating appropriate contextual reactions
  • lip-sync accuracy
  • motion realism
  • visual quality
  • evaluation frameworks
  • world models
  • interactive human/agent modeling
  • diffusion models
  • video diffusion models
  • full model development pipeline end to end

Other signals

  • Generative AI
  • diffusion models
  • avatar-centric interactive video
  • real-time rates
  • user audio and generating appropriate contextual reactions
  • lip-sync accuracy
  • motion realism
  • visual quality
  • evaluation frameworks
  • world models
  • interactive human/agent modeling
  • PyTorch
  • Python engineering