Research Engineer, Computer Vision

Meta Meta · Big Tech · Pittsburgh, PA

Research Engineer focused on Multi-Modal Understanding, developing algorithms that integrate computer vision with language, audio, and sensor data. Drives curation of multi-modal datasets and annotation pipelines, bringing innovative solutions to production for immersive applications.

What you'd actually do

  1. Design and implement multi-modal understanding systems that combine vision, language, and other sensory inputs to enable richer contextual awareness
  2. Develop algorithms for cross-modal learning, fusion, and reasoning to improve human-AI interaction
  3. Lead the curation and management of multi-modal datasets, ensuring data quality and diversity across vision, language, and sensor modalities
  4. Design and oversee ground truth annotation workflows and quality assurance processes for multi-modal data
  5. Complete medium to large features spanning multiple tasks independently with minimal to no guidance

Skills

Required

  • C++
  • Python
  • PyTorch
  • TensorFlow
  • deep learning frameworks
  • cross-functional teams

Nice to have

  • Master's degree in Computer Science, Computer Vision, Machine Learning, or related field
  • vision-language models
  • multi-modal transformers
  • Publications or contributions to multi-modal understanding research
  • large language models
  • data curation
  • annotation tools
  • ground truth labeling pipelines

What the JD emphasized

  • Degree must be completed prior to joining Meta

Other signals

  • multi-modal understanding
  • vision-language models
  • production systems