What you'd actually do

Adapt diffusion models to incorporate diverse conditioning signals (e.g., audio, motion, interaction cues).

Develop methods for streaming infinitely long video sequences at real-time rates.

Work on the perceptual layer of interactive agents, including understanding user audio and generating appropriate contextual reactions.

Improve lip-sync accuracy, motion realism, and overall visual quality in video diffusion models.

Build robust evaluation frameworks and test suites to enable continuous quality tracking.

Skills

Required

ML (e.g., diffusion, GANs, VAEs)
computer vision
diffusion models
PyTorch
modern ML frameworks and tooling
Python engineering
git and version control
clean, maintainable research code

Nice to have

audio-conditioned video diffusion models
video DiT architectures
full model development pipeline end to end
publication record in areas such as world models, interactive agents, or video diffusion models

What the JD emphasized

avatar-centric interactive video diffusion models

real-time rates

user audio and generating appropriate contextual reactions

lip-sync accuracy

motion realism

visual quality

evaluation frameworks

world models

interactive human/agent modeling

diffusion models

video diffusion models

full model development pipeline end to end

Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London, with offices and teams across Europe and the US.

As AI continues to shape the way we live and work, Synthesia develops products to enhance visual communication and enterprise skill development, helping people work better and stay at the center of successful organizations.

Following our recent Series E funding round, where we raised $200 million, our valuation stands at $4 billion. Our total funding exceeds $530 million from premier investors including Accel, NVentures (Nvidia's VC arm), Kleiner Perkins, GV, and Evantic Capital, alongside the founders and operators of Stripe, Datadog, Miro, and Webflow.

About the role

As a Research Engineer, you will join a team of 40+ Researchers and Engineers within the R&D Department working on cutting edge challenges in the Generative AI space, with a focus on avatar-centric interactive video diffusion models. Within the team you’ll have the opportunity to work on the applied side of our research efforts and directly impact our solutions that are used worldwide by over 60,000 businesses.

This is a unique opportunity for experts in machine learning and diffusion models to shape the future of AI video agents that can think, act, and react like humans. As part of our Interactive Avatars Team, you’ll work on cutting-edge research with a clear focus on turning breakthrough ideas into real product capabilities. You’ll join a team that moves fast, iterates often, and builds models that ship and make a meaningful impact. Example tasks and responsibilities include:

Adapt diffusion models to incorporate diverse conditioning signals (e.g., audio, motion, interaction cues).
Develop methods for streaming infinitely long video sequences at real-time rates.
Work on the perceptual layer of interactive agents, including understanding user audio and generating appropriate contextual reactions.
Improve lip-sync accuracy, motion realism, and overall visual quality in video diffusion models.
Build robust evaluation frameworks and test suites to enable continuous quality tracking.
Collaborate closely with our data team to define data needs and ensure high-quality datasets.
Stay up to date with research in world models, interactive human/agent modeling, diffusion models, and related areas.

What we're looking for:

Comfortable owning and executing on the responsibilities listed above.
Strong ML (e.g., diffusion, GANs, VAEs) and computer vision background with relevant industry experience.
Hands-on experience with diffusion models (ideally avatar-centric or video-focused) and up to date with recent advances.
Proficient in PyTorch and familiar with modern ML frameworks and tooling.
Strong Python engineering skills, confident with git and version control, and a commitment to clean, maintainable research code.
Outcome-driven, detail-oriented, and motivated to push state-of-the-art research into real product impact.
Clear communicator of hypotheses, experiments, and results.

What will make you stand out:

Experience with audio-conditioned video diffusion models and deep knowledge of recent video DiT architectures.
Demonstrated ability to own the full model development pipeline end to end, from data preparation to model design, training, and evaluation.
A strong publication record in areas such as world models, interactive agents, or video diffusion models.

Why join us?

We’re living the golden age of AI. The next decade will yield the next iconic companies, and we dare to say we have what it takes to become one. Here’s why,

Our culture

At Synthesia we’re passionate about building, not talking, planning or politicising. We strive to hire the smartest, kindest and most unrelenting people and let them do their best work without distractions. Our work principles serve as our charter for how we make decisions, give feedback and structure our work to empower everyone to go as fast as possible. **You can find out more about these principles here.**

Serving 50,000+ customers (and 50% of the Fortune 500)

We’re trusted by leading brands such as Heineken, Zoom, Xerox, McDonald’s and more. Read stories from happy customers and what 1,200+ people say on G2.

Proprietary AI technology

Since 2017, we’ve been pioneering advancements in Generative AI. Our AI technology is built in-house, by a team of world-class AI researchers and engineers. Learn more about our AI Research Lab and the team behind.

AI Safety, Ethics and Security

AI safety, ethics, and security are fundamental to our mission. While the full scope of Artificial Intelligence's impact on our society is still unfolding, our position is clear: People first. Always. Learn more about our commitments to AI Ethics, Safety & Security.

The good stuff...

Competitive compensation (salary + stock options + bonus)
Hybrid work setting with an office in London, Amsterdam, Zurich, Munich, or remote in Europe.
25 days of annual leave + public holidays
Great company culture with the option to join regular planning and socials at our hubs
- other benefits depending on your location

You can see more about Who we are and How we work here:https://www.synthesia.io/careers

LI-MD1

About the role

Adapt diffusion models to incorporate diverse conditioning signals (e.g., audio, motion, interaction cues).
Develop methods for streaming infinitely long video sequences at real-time rates.
Work on the perceptual layer of interactive agents, including understanding user audio and generating appropriate contextual reactions.
Improve lip-sync accuracy, motion realism, and overall visual quality in video diffusion models.
Build robust evaluation frameworks and test suites to enable continuous quality tracking.
Collaborate closely with our data team to define data needs and ensure high-quality datasets.
Stay up to date with research in world models, interactive human/agent modeling, diffusion models, and related areas.

What we're looking for:

Comfortable owning and executing on the responsibilities listed above.
Strong ML (e.g., diffusion, GANs, VAEs) and computer vision background with relevant industry experience.
Hands-on experience with diffusion models (ideally avatar-centric or video-focused) and up to date with recent advances.
Proficient in PyTorch and familiar with modern ML frameworks and tooling.
Strong Python engineering skills, confident with git and version control, and a commitment to clean, maintainable research code.
Outcome-driven, detail-oriented, and motivated to push state-of-the-art research into real product impact.
Clear communicator of hypotheses, experiments, and results.

What will make you stand out:

Experience with audio-conditioned video diffusion models and deep knowledge of recent video DiT architectures.
Demonstrated ability to own the full model development pipeline end to end, from data preparation to model design, training, and evaluation.
A strong publication record in areas such as world models, interactive agents, or video diffusion models.

Why join us?

We’re living the golden age of AI. The next decade will yield the next iconic companies, and we dare to say we have what it takes to become one. Here’s why,

Our culture

Serving 50,000+ customers (and 50% of the Fortune 500)

We’re trusted by leading brands such as Heineken, Zoom, Xerox, McDonald’s and more. Read stories from happy customers and what 1,200+ people say on G2.

Proprietary AI technology

AI Safety, Ethics and Security

The good stuff...

Competitive compensation (salary + stock options + bonus)
Hybrid work setting with an office in London, Amsterdam, Zurich, Munich, or remote in Europe.
25 days of annual leave + public holidays
Great company culture with the option to join regular planning and socials at our hubs
- other benefits depending on your location

You can see more about Who we are and How we work here:https://www.synthesia.io/careers

LI-MD1

Senior Research Engineer - Interactive Avatars

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

About the role

What we're looking for:

What will make you stand out:

Why join us?

Our culture

Serving 50,000+ customers (and 50% of the Fortune 500)

Proprietary AI technology

AI Safety, Ethics and Security

The good stuff...

About the role

What we're looking for:

What will make you stand out:

Why join us?

Our culture

Serving 50,000+ customers (and 50% of the Fortune 500)

Proprietary AI technology

AI Safety, Ethics and Security

The good stuff...