Meta is seeking a creative, skilled and motivated Research Scientist to advance the state-of-the-art in multi-modal understanding. You will work on developing models that reason across vision, language, and other modalities to enable richer AI experiences across Meta's family of apps and products. You will collaborate with research scientists, software engineers, and data scientists to design technical solutions in a fast-paced multidisciplinary environment.
Responsibilities
Develop and advance multi-modal models that integrate vision, language, audio, and other modalities Research novel architectures and training methods for cross-modal reasoning and understanding Design and prototype interactive experiences that leverage multi-modal AI capabilities Collaborate across teams to develop concepts that advance the entire research pipeline (hardware, software, data collection, machine learning, etc.) Publish research findings at top-tier conferences and contribute to the broader research community
Qualifications
Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta Currently has, or is in the process of obtaining, a PhD degree in Computer Science, Machine Learning, or relevant technical field. Degree must be completed prior to joining Meta Experience in multi-modal learning, combining vision, audio, language, or related areas Experience working with PyTorch or TensorFlow Experience with transformer architectures and large-scale model training Technical knowledge across machine learning, deep learning, and statistical modeling Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment First-authored publications at leading conferences such as NeurIPS, ICML, and CVPR, or similar Experience with large language models (LLMs) and their integration with other modalities Experience transferring multi-modal research into shipping products Experience working and communicating cross-functionally in a team environment Research experience in vision-language models, multi-modal transformers, or cross-modal representation learning