AI Research Scientist

Meta · Big Tech · Redmond, WA

Research Scientist at Meta focused on advancing multi-modal understanding by developing models and systems that reason across text, images, video, and audio, with the goal of impacting products used by billions.

What you'd actually do

Conduct research on multi-modal learning, including vision-language models, audio-visual understanding, and cross-modal reasoning
Develop novel architectures and training methodologies for models that integrate and reason across multiple modalities
Design and execute experiments to evaluate multi-modal model capabilities and identify areas for improvement
Publish research findings at top-tier conferences and contribute to Meta's research community
Collaborate with cross-functional teams to translate research innovations into product applications

Skills

Required

PhD in Computer Science, Machine Learning, Artificial Intelligence, or a related field
Experience with multi-modal learning, vision-language models, or cross-modal representation learning demonstrated through publications or projects
Experience programming in Python
Experience with deep learning frameworks such as PyTorch
Experience with large-scale model training and distributed computing
Experience building end-to-end multi-modal systems from research to production

Nice to have

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
Experience with video understanding or audio-visual learning
Experience with large language models, vision transformers, or foundation models

What the JD emphasized

Publications at venues such as NeurIPS, ICML, ICLR, CVPR, ACL, or EMNLP focused on multi-modal learning

Other signals

multi-modal understanding
reason across multiple modalities
cutting-edge research
impact billions of users

Read full job description

Meta is seeking a Research Scientist to advance the field of multi-modal understanding. This role focuses on developing models and systems that can reason across multiple modalities including text, images, video, and audio. You will work on cutting-edge research to enable AI systems to perceive, interpret, and generate content across diverse data types, contributing to products that impact billions of users worldwide.

Responsibilities

Conduct research on multi-modal learning, including vision-language models, audio-visual understanding, and cross-modal reasoning Develop novel architectures and training methodologies for models that integrate and reason across multiple modalities Design and execute experiments to evaluate multi-modal model capabilities and identify areas for improvement Publish research findings at top-tier conferences and contribute to Meta's research community Collaborate with cross-functional teams to translate research innovations into product applications Mentor and guide other researchers on multi-modal AI projects

Qualifications

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience PhD in Computer Science, Machine Learning, Artificial Intelligence, or a related field Experience with multi-modal learning, vision-language models, or cross-modal representation learning demonstrated through publications or projects Experience programming in Python and with deep learning frameworks such as PyTorch Experience with large-scale model training and distributed computing Experience building end-to-end multi-modal systems from research to production Experience with video understanding or audio-visual learning Publications at venues such as NeurIPS, ICML, ICLR, CVPR, ACL, or EMNLP focused on multi-modal learning Experience with large language models, vision transformers, or foundation models