Research Scientist

Meta Meta · Big Tech · Pittsburgh, PA +1

Research Scientist at Meta Reality Labs focusing on multi-modal AI research, integrating vision, language, audio, and sensor data to build next-generation AI-powered interactions. The role involves leading research projects, developing and optimizing multi-modal models, and transitioning research into production with a focus on cross-modal alignment and fusion.

What you'd actually do

  1. Lead the design, development, and optimization of multi-modal models that integrate vision, language, audio, and sensor inputs
  2. Set technical direction for multi-modal research projects
  3. Conduct research and experiments to improve cross-modal alignment and fusion strategies
  4. Collaborate with cross-functional teams (engineering, HCI, product) to transition multi-modal research into production
  5. Explore and adopt novel model optimization, quantization, and efficiency techniques

Skills

Required

  • PhD in Computer Science, Machine Learning, Computer Vision, or a related technical field
  • Expertise in multi-modal learning (architecture design, training, cross-modal alignment)
  • Programming experience in Python
  • Hands-on experience with deep learning frameworks (PyTorch)
  • Experience developing machine learning models at scale
  • 5+ years of research experience with multiple modalities (vision, language, audio, sensor data)
  • Deep expertise in vision-language models, cross-modal attention mechanisms, or contrastive learning
  • First-authored publications at peer-reviewed AI conferences
  • Experience with on-device or edge multi-modal model optimization (quantization, sparsity, distillation)
  • Demonstrated software engineering experience
  • Experience bringing multi-modal AI products from research to production
  • Proven track record of developing multi-modal models that fuse vision, language, and/or audio

Nice to have

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

What the JD emphasized

  • multi-modal understanding
  • vision, language, audio, and sensor modalities
  • cross-modal alignment and fusion strategies
  • transition multi-modal research into production
  • multi-modal learning
  • vision-language models
  • First-authored publications at peer-reviewed AI conferences
  • on-device or edge multi-modal model optimization
  • bringing multi-modal AI products from research to production
  • developing multi-modal models that fuse vision, language, and/or audio for real-world applications

Other signals

  • multi-modal understanding
  • vision, language, audio, and sensor modalities
  • cross-modal alignment and fusion
  • research into production