Research Scientist, Multi-modal

Meta · Big Tech · Pittsburgh, PA

Research Scientist focused on advancing state-of-the-art multi-modal models that integrate vision, language, audio, and other modalities. The role involves researching novel architectures and training methods for cross-modal reasoning, designing interactive experiences, and publishing findings. Requires a PhD and experience in multi-modal learning, PyTorch/TensorFlow, transformer architectures, and large-scale model training, with a strong publication record.

What you'd actually do

Develop and advance multi-modal models that integrate vision, language, audio, and other modalities
Research novel architectures and training methods for cross-modal reasoning and understanding
Design and prototype interactive experiences that leverage multi-modal AI capabilities
Collaborate across teams to develop concepts that advance the entire research pipeline (hardware, software, data collection, machine learning, etc.)
Publish research findings at top-tier conferences and contribute to the broader research community

Skills

Required

PhD degree in Computer Science, Machine Learning, or relevant technical field
Experience in multi-modal learning, combining vision, audio, language, or related areas
Experience working with PyTorch or TensorFlow
Experience with transformer architectures and large-scale model training
Technical knowledge across machine learning, deep learning, and statistical modeling
First-authored publications at leading conferences such as NeurIPS, ICML, and CVPR, or similar
Experience with large language models (LLMs) and their integration with other modalities
Experience transferring multi-modal research into shipping products
Research experience in vision-language models, multi-modal transformers, or cross-modal representation learning

Nice to have

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
Experience working and communicating cross-functionally in a team environment

What the JD emphasized

Degree must be completed prior to joining Meta
First-authored publications at leading conferences such as NeurIPS, ICML, and CVPR, or similar

Other signals

multi-modal learning
transformer architectures
large-scale model training
cross-modal reasoning
vision-language models

Read full job description

Meta is seeking a creative, skilled and motivated Research Scientist to advance the state-of-the-art in multi-modal understanding. You will work on developing models that reason across vision, language, and other modalities to enable richer AI experiences across Meta's family of apps and products. You will collaborate with research scientists, software engineers, and data scientists to design technical solutions in a fast-paced multidisciplinary environment.

Responsibilities

Develop and advance multi-modal models that integrate vision, language, audio, and other modalities Research novel architectures and training methods for cross-modal reasoning and understanding Design and prototype interactive experiences that leverage multi-modal AI capabilities Collaborate across teams to develop concepts that advance the entire research pipeline (hardware, software, data collection, machine learning, etc.) Publish research findings at top-tier conferences and contribute to the broader research community

Qualifications

Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta Currently has, or is in the process of obtaining, a PhD degree in Computer Science, Machine Learning, or relevant technical field. Degree must be completed prior to joining Meta Experience in multi-modal learning, combining vision, audio, language, or related areas Experience working with PyTorch or TensorFlow Experience with transformer architectures and large-scale model training Technical knowledge across machine learning, deep learning, and statistical modeling Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment First-authored publications at leading conferences such as NeurIPS, ICML, and CVPR, or similar Experience with large language models (LLMs) and their integration with other modalities Experience transferring multi-modal research into shipping products Experience working and communicating cross-functionally in a team environment Research experience in vision-language models, multi-modal transformers, or cross-modal representation learning