Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London, with offices and teams across Europe and the US.

As AI continues to shape the way we live and work, Synthesia develops products to enhance visual communication and enterprise skill development, helping people work better and stay at the center of successful organizations.

Following our recent Series E funding round, where we raised $200 million, our valuation stands at $4 billion. Our total funding exceeds $530 million from premier investors including Accel, NVentures (Nvidia's VC arm), Kleiner Perkins, GV, and Evantic Capital, alongside the founders and operators of Stripe, Datadog, Miro, and Webflow.

About the role

As a Research Engineer in our Video Pre-Training team, you will help build the next generation of production-grade foundation models for human-centric video generation.

You will join a highly focused team working at the intersection of large-scale generative modeling, distributed systems, and production engineering. Our mission is to develop and optimize video base models that power realistic, controllable, and emotionally expressive synthetic humans at scale.

This is not pure research. This is applied research with direct product impact.

You will work on advancing training recipes, scaling distributed systems, improving evaluation frameworks, and optimizing inference to ensure our models are high quality, stable, and efficient enough for real-world deployment. Your work will directly influence models used by tens of thousands of businesses worldwide.

What you’ll do

You will own and execute end-to-end research and engineering projects, from hypothesis to production impact. This includes:

Developing and scaling latent video diffusion models tailored for human-centric video generation
Designing conditioning mechanisms to improve control (pose, emotion, script, camera) without sacrificing fidelity
Advancing distributed training strategies (DDP, FSDP, DeepSpeed, sequence parallelism) under real compute constraints
Improving training stability at multi-node scale
Designing rigorous evaluation frameworks combining automated metrics and structured human evaluation
Optimizing inference for low latency, high resolution, and cost efficiency
Running controlled ablations and experiments to drive high-signal modeling decisions
Contributing to high engineering standards: reproducibility, experiment tracking, CI/CD, monitoring

You will be expected to move fast, run multiple hypotheses in parallel, identify signal early, and focus on outcomes rather than exploration for its own sake.

What we’re looking for

Must-have

Strong experience training deep learning models at scale
Strong Python and PyTorch skills
Hands-on experience with diffusion models (image domain required; video preferred)
Experience with large scale multi-GPU / multi-node training
Good understanding of distributed training (DDP, FSDP, DeepSpeed or similar)
Ability to design controlled experiments and interpret noisy results

Nice-to-have

Experience with video diffusion models
Experience in avatar or human-centric generation
Familiarity with world / interactive models
Experience with GANs or VAEs

Experience optimizing inference systems for production

Our stack

Python, PyTorch, CUDA
DeepSpeed, distributed training & inference
Sequence parallelism
AWS, SLURM, Docker
GitHub, CI/CD pipelines

Who you are

You are research-driven but outcome-focused
You care about shipping, not just publishing
You can explore multiple ideas quickly and drop low-signal directions early
You communicate clearly and present results scientifically
You operate independently but collaborate actively across teams

Why join us?

Build production-scale video foundation models in a fast-growing Generative AI company
Work on human-centric video generation with real-world impact
Tackle hard problems in scaling, stability, and controllability
Influence the direction of next-generation synthetic human technology
Join a highly technical, high-ownership environment where your work ships

If you want to work on cutting-edge generative video models and see your research power real-world products, we’d love to talk.

Our culture

At Synthesia we’re passionate about building, not talking, planning or politicising. We strive to hire the smartest, kindest and most unrelenting people and let them do their best work without distractions. Our work principles serve as our charter for how we make decisions, give feedback and structure our work to empower everyone to go as fast as possible. **You can find out more about these principles here.**

Serving 50,000+ customers (and 50% of the Fortune 500)

We’re trusted by leading brands such as Heineken, Zoom, Xerox, McDonald’s and more. Read stories from happy customers and what 1,200+ people say on G2.

Proprietary AI technology

Since 2017, we’ve been pioneering advancements in Generative AI. Our AI technology is built in-house, by a team of world-class AI researchers and engineers. Learn more about our AI Research Lab and the team behind.

AI Safety, Ethics and Security

AI safety, ethics, and security are fundamental to our mission. While the full scope of Artificial Intelligence's impact on our society is still unfolding, our position is clear: People first. Always. Learn more about our commitments to AI Ethics, Safety & Security.

The good stuff...

Competitive compensation (salary + stock options + bonus)
Fully remote from Europe or hybrid work setting with an office in London, Amsterdam, Zurich, Munich
25 days of annual leave + public holidays
Great company culture with the option to join regular planning and socials at our hubs
- other benefits depending on your location

You can see more about Who we are and How we work here:https://www.synthesia.io/careers

About the role

As a Research Engineer in our Video Pre-Training team, you will help build the next generation of production-grade foundation models for human-centric video generation.

This is not pure research. This is applied research with direct product impact.

What you’ll do

You will own and execute end-to-end research and engineering projects, from hypothesis to production impact. This includes:

Developing and scaling latent video diffusion models tailored for human-centric video generation
Designing conditioning mechanisms to improve control (pose, emotion, script, camera) without sacrificing fidelity
Advancing distributed training strategies (DDP, FSDP, DeepSpeed, sequence parallelism) under real compute constraints
Improving training stability at multi-node scale
Designing rigorous evaluation frameworks combining automated metrics and structured human evaluation
Optimizing inference for low latency, high resolution, and cost efficiency
Running controlled ablations and experiments to drive high-signal modeling decisions
Contributing to high engineering standards: reproducibility, experiment tracking, CI/CD, monitoring

You will be expected to move fast, run multiple hypotheses in parallel, identify signal early, and focus on outcomes rather than exploration for its own sake.

What we’re looking for

Must-have

Strong experience training deep learning models at scale
Strong Python and PyTorch skills
Hands-on experience with diffusion models (image domain required; video preferred)
Experience with large scale multi-GPU / multi-node training
Good understanding of distributed training (DDP, FSDP, DeepSpeed or similar)
Ability to design controlled experiments and interpret noisy results

Nice-to-have

Experience with video diffusion models
Experience in avatar or human-centric generation
Familiarity with world / interactive models
Experience with GANs or VAEs

Experience optimizing inference systems for production

Our stack

Python, PyTorch, CUDA
DeepSpeed, distributed training & inference
Sequence parallelism
AWS, SLURM, Docker
GitHub, CI/CD pipelines

Who you are

You are research-driven but outcome-focused
You care about shipping, not just publishing
You can explore multiple ideas quickly and drop low-signal directions early
You communicate clearly and present results scientifically
You operate independently but collaborate actively across teams

Why join us?

Build production-scale video foundation models in a fast-growing Generative AI company
Work on human-centric video generation with real-world impact
Tackle hard problems in scaling, stability, and controllability
Influence the direction of next-generation synthetic human technology
Join a highly technical, high-ownership environment where your work ships

If you want to work on cutting-edge generative video models and see your research power real-world products, we’d love to talk.

Our culture

Serving 50,000+ customers (and 50% of the Fortune 500)

We’re trusted by leading brands such as Heineken, Zoom, Xerox, McDonald’s and more. Read stories from happy customers and what 1,200+ people say on G2.

Proprietary AI technology

AI Safety, Ethics and Security

The good stuff...

Competitive compensation (salary + stock options + bonus)
Fully remote from Europe or hybrid work setting with an office in London, Amsterdam, Zurich, Munich
25 days of annual leave + public holidays
Great company culture with the option to join regular planning and socials at our hubs
- other benefits depending on your location

You can see more about Who we are and How we work here:https://www.synthesia.io/careers

Senior Research Engineer - Video Foundation Models (pre - Training)

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

About the role

Nice-to-have

Our stack

Who you are

Why join us?

Serving 50,000+ customers (and 50% of the Fortune 500)

Proprietary AI technology

AI Safety, Ethics and Security

The good stuff...

About the role

Nice-to-have

Our stack

Who you are

Why join us?

Serving 50,000+ customers (and 50% of the Fortune 500)

Proprietary AI technology

AI Safety, Ethics and Security

The good stuff...