Research Scientist, Intelligent Editing (multimodality)

ByteDance · Big Tech · Seattle, WA · R&D

Research Scientist role focusing on multimodal understanding, vision and language, large-scale training, and RLHF for intelligent editing within ByteDance's Intelligent Creation Team. The role involves cutting-edge research and transferring technologies to products.

What you'd actually do

Conduct cutting-edge research and development in computer vision and machine learning, especially in the areas of multi-modal understanding, vision and language, large-scale training, etc.
Transfer advanced technologies to ByteDance products;
Explore new products with artificial intelligence technology at its core.

Skills

Required

computer vision
machine learning
multimodal understanding
vision and language
language models
large-scale training
RLHF
C++
Python

Nice to have

publications in top-tier conferences (CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, ACL, EMNLP, COLING)
video highlight detection and slicing
audio/music understanding
image/video captioning
retrieval
VQA
intelligent editing

What the JD emphasized

multimodal understanding
vision and language
large-scale training
RLHF

Other signals

multimodal understanding
vision and language
large-scale training
RLHF

Read full job description

About the team The Intelligent Creation Team is the AI, special effects, and audio-video creation technology team, responsible for the core technology and business development. It covers a variety of technical fields, including deep learning, computer vision, graphics, speech, recording and editing, special effects, client and server engineering, and provides cutting-edge content understanding, content creation, interactive experience, and consumption capabilities and industry solutions to other business lines within the company and external partners in various forms.

Responsibilities

Conduct cutting-edge research and development in computer vision and machine learning, especially in the areas of multi-modal understanding, vision and language, large-scale training, etc.
Transfer advanced technologies to ByteDance products;
Explore new products with artificial intelligence technology at its core.

Requirements

Minimum Qualifications

At least 1 year of research and practical experience in one or more areas of computer vision, including but not limited to:
Experience in multimodal understanding, such as video highlight detection and slicing, audio/music understanding, etc.
Experience in vision and language, such as image/video captioning, retrieval, VQA, and other related fields.
Experience with language models and apply them in various downstream tasks, especially for intelligent editing.
Experience in large-scale training and RLHF.

Preferred Qualifications

Preferring candidates with publications in venues such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML or ACL, EMNLP, COLING, etc
Highly competent in algorithms and programming; Strong coding skills in C/C++ and Python.
Work and collaborate well with team members.
Ability to work independently.