Research Scientist Intern, Photorealistic Telepresence (phd)

Meta · Big Tech · Sausalito, CA +2

Research Scientist Intern at Meta focused on photorealistic telepresence and autonomous social agents in AR/VR. The role involves generative AI for image/video synthesis, digital human motion, social signal encoding, face/body reconstruction, and multimodal LLMs (speech-to-speech, audio-visual). Requires PhD in a related field, ML experience, deep learning frameworks, Python, and a track record of publications/patents.

What you'd actually do

Solve research problems in enabling photorealistic telepresence and autonomous social agents.
Collaboration with and support of other researchers across various disciplines.
Communication of research agenda, progress, and results.

Skills

Required

PhD in Computer Science, Computer Vision, Computer Graphics, Robotics, Machine Learning, or related field
Experience with solving “inverse problems” in imaging emphasizing modeling and algorithm development
2+ years of experience with Machine Learning for solving computer vision and computer graphics problems
Experience with deep learning frameworks such as Pytorch and TensorBoard
Experience with scientific programming languages such as Python
Proven track record of achieving significant results as demonstrated by patents and first-authored publications at leading workshops or conferences such as ICCV, CVPR, NeurIPS, SIGGRAPH, ICASSP, or similar
Intent to return to a degree-program after the completion of the internship
Experience working and communicating cross functionally in a team environment
Demonstrated software engineer experience via an internship, work experience, coding competitions, or widely used contributions in open source repositories (e.g. GitHub)
Experience with systems building in Python or C++
Experience with large-scale generative models such as LLMs and video diffusion models
Experience with Machine Learning for 3D data (such as meshes, point clouds, gaussian splatting, and voxels)
Experience with Machine Learning for audio and visual synthesis

Nice to have

work authorization in the country of employment

What the JD emphasized

PhD in Computer Science, Computer Vision, Computer Graphics, Robotics, Machine Learning, or related field
Experience with solving “inverse problems” in imaging emphasizing modeling and algorithm development
2+ years of experience with Machine Learning for solving computer vision and computer graphics problems
Experience with deep learning frameworks such as Pytorch and TensorBoard
Experience with scientific programming languages such as Python
Proven track record of achieving significant results as demonstrated by patents and first-authored publications at leading conferences

Other signals

Generative AI models for image and video synthesis
Motion and behavior synthesis for digital humans
VR/AR encoding of social signals
Face and body reconstruction and tracking
Multimodal LLMs, such as speech-to-speech LLMs and audio-visual LLMs

Read full job description

Meta’s mission is to give people the power to build community and bring the world closer together. Through our family of apps and services, we're building a different kind of company that connects billions of people around the world, gives them ways to share what matters most to them, and helps bring people closer together. Whether we're creating new products or helping a small business expand its reach, people at Meta are builders at heart. Our global teams are constantly iterating, solving problems, and working together to empower people around the world to build community and connect in meaningful ways. Together, we can help people build stronger communities — we're just getting started. Our org, XR Codec Interactions and Avatar, is looking for exceptional research interns to help us create a revolution in AR and VR: achieving true photorealistic telepresence, where you can be with anyone, anywhere, at any time. We have made numerous important advances, and it will take a diverse team with a wide spectrum of skills to accomplish this future. If you have expertise in Artificial Intelligence, Computer Vision, or Computer Graphics, we expect you will find the work here highly intriguing. We regularly publish our work at leading conferences and journals. Come join us as we make photorealistic telepresence and autonomous social agents in VR happen! Available projects may include, but are not limited to: Generative AI models for image and video synthesis Motion and behavior synthesis for digital humans VR/AR encoding of social signals Face and body reconstruction and tracking Multimodal LLMs, such as speech-to-speech LLMs and audio-visual LLMs Our internships are twelve (12) to twenty four (24) weeks long and we have various start dates throughout the year.

Responsibilities

Solve research problems in enabling photorealistic telepresence and autonomous social agents. Collaboration with and support of other researchers across various disciplines. Communication of research agenda, progress, and results.

Qualifications

Currently has, or is in the process of obtaining, a PhD in Computer Science, Computer Vision, Computer Graphics, Robotics, Machine Learning, or related field Experience with solving “inverse problems” in imaging emphasizing modeling and algorithm development 2+ years of experience with Machine Learning for solving computer vision and computer graphics problems Experience with deep learning frameworks such as Pytorch and TensorBoard Experience with scientific programming languages such as Python Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment Proven track record of achieving significant results as demonstrated by patents and first-authored publications at leading workshops or conferences such as ICCV, CVPR, NeurIPS, SIGGRAPH, ICASSP, or similar Intent to return to a degree-program after the completion of the internship Experience working and communicating cross functionally in a team environment Demonstrated software engineer experience via an internship, work experience, coding competitions, or widely used contributions in open source repositories (e.g. GitHub) Experience with systems building in Python or C++ Experience with large-scale generative models such as LLMs and video diffusion models Experience with Machine Learning for 3D data (such as meshes, point clouds, gaussian splatting, and voxels) Experience with Machine Learning for audio and visual synthesis