AI Research Scientist, Video Generation and Post Training, Fair

Meta · Big Tech · Bellevue, WA +2

Research Scientist role focused on video generation and post-training of large-scale multimodal models within Meta's Fundamental AI Research (FAIR) team. The role involves developing generative models, optimizing post-training paradigms, and contributing to frontier models for next-generation AI systems, with a focus on video and media generation.

What you'd actually do

Conduct fundamental and applied research in video generation, including generative models, video synthesis, and multimodal learning
Develop and optimize post-training paradigms for large-scale video and multimodal models, improving their performance, robustness, and generalization
Collaborate with teams across Meta to build perceptual foundations for real-time embodied agents and conversational AI
Contribute to the development and deployment of frontier models (e.g., Llama, LMMs) and push the boundaries of video and media generation

Skills

Required

video generation
computer vision
multimodal AI
large-scale model training
post-training optimization techniques
data curation
video synthesis
multimodal fusion techniques
video-language models
complex problem solving
interdisciplinary team collaboration

Nice to have

PhD or equivalent experience
expertise in video generation
expertise in computer vision
expertise in multimodal AI
experience with large-scale model training
experience with post-training optimization techniques
experience with data curation
experience with video synthesis
experience with multimodal fusion techniques
experience with video-language models
experience solving complex problems
experience working and communicating cross-functionally in a collaborative, interdisciplinary team environment

What the JD emphasized

publication record
Proven track record of achieving significant results, as demonstrated by grants, fellowships, patents, or publications at leading workshops, journals, or conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV)

Other signals

video generation
post-training
large-scale models
multimodal learning
frontier models

Read full job description

Meta is seeking a Research Scientist to join the Fundamental AI Research (FAIR) team within Meta Superintelligence Labs (MSL). Our mission is to advance the science of intelligence and develop technologies that push the boundaries of AI. We are looking for researchers with expertise in video generation and post-training of large-scale models to help build the perceptual and generative foundations for next-generation AI systems. This role offers the opportunity to collaborate with a highly interdisciplinary team of scientists, engineers, and cross-functional partners, leveraging cutting-edge technology, resources, and research facilities.

Responsibilities

Conduct fundamental and applied research in video generation, including generative models, video synthesis, and multimodal learning Develop and optimize post-training paradigms for large-scale video and multimodal models, improving their performance, robustness, and generalization Collaborate with teams across Meta to build perceptual foundations for real-time embodied agents and conversational AI Contribute to the development and deployment of frontier models (e.g., Llama, LMMs) and push the boundaries of video and media generation

Qualifications

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience PhD or equivalent experience in Computer Science, Electrical Engineering, or a related field Demonstrated expertise in video generation, computer vision, or multimodal AI Experience with large-scale model training, post-training optimization techniques, and data curation Publication record in relevant fields Demonstrated research and software engineering experience via internships, industry or academic work experience, coding competitions, or widely used contributions in open source repositories (e.g., GitHub) Experience with video generation, video synthesis, or multimodal fusion techniques Experience with video-language models and architectures relevant to video generation and post training Proven track record of achieving significant results, as demonstrated by grants, fellowships, patents, or publications at leading workshops, journals, or conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV) Experience solving complex problems and evaluating alternative solutions, tradeoffs, and perspectives to determine a path forward Experience working and communicating cross-functionally in a collaborative, interdisciplinary team environment