Research Scientist, Multi-modal

Meta Meta · Big Tech · Pittsburgh, PA

Research Scientist focused on advancing state-of-the-art multi-modal models that integrate vision, language, audio, and other modalities. The role involves researching novel architectures and training methods for cross-modal reasoning, designing interactive experiences, and publishing findings. Requires a PhD and experience in multi-modal learning, PyTorch/TensorFlow, transformer architectures, and large-scale model training, with a strong publication record.

What you'd actually do

  1. Develop and advance multi-modal models that integrate vision, language, audio, and other modalities
  2. Research novel architectures and training methods for cross-modal reasoning and understanding
  3. Design and prototype interactive experiences that leverage multi-modal AI capabilities
  4. Collaborate across teams to develop concepts that advance the entire research pipeline (hardware, software, data collection, machine learning, etc.)
  5. Publish research findings at top-tier conferences and contribute to the broader research community

Skills

Required

  • PhD degree in Computer Science, Machine Learning, or relevant technical field
  • Experience in multi-modal learning, combining vision, audio, language, or related areas
  • Experience working with PyTorch or TensorFlow
  • Experience with transformer architectures and large-scale model training
  • Technical knowledge across machine learning, deep learning, and statistical modeling
  • First-authored publications at leading conferences such as NeurIPS, ICML, and CVPR, or similar
  • Experience with large language models (LLMs) and their integration with other modalities
  • Experience transferring multi-modal research into shipping products
  • Research experience in vision-language models, multi-modal transformers, or cross-modal representation learning

Nice to have

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • Experience working and communicating cross-functionally in a team environment

What the JD emphasized

  • Degree must be completed prior to joining Meta
  • First-authored publications at leading conferences such as NeurIPS, ICML, and CVPR, or similar

Other signals

  • multi-modal learning
  • transformer architectures
  • large-scale model training
  • cross-modal reasoning
  • vision-language models