Research Scientist in Large Multimodal Models Applications - San Jose

ByteDance · Big Tech · San Jose, CA · R&D

Research Scientist role focusing on applying large multimodal models to multimedia applications like video understanding, processing, and compression. Involves model training, tuning, and performance optimization, with a strong emphasis on academic research and publication.

What you'd actually do

Contribute to the research and development of multimedia algorithms based on large multimodal models, including but not limited to video understanding, quality assessment, video processing and enhancement, and video compression.
Optimize and accelerate the performance of algorithms related to large multimodal models.
Explore the implementation of large multimodal models in multimedia applications, such as short video streaming, video transcoding, live streaming, etc.
Conduct advanced academic research on large multimodal models and publish findings in international conferences and journals.

Skills

Required

Diffusion models
LLMs
Large multimodal models
Model training
Model tuning
Model application
Computer vision algorithms
GAN
VAE
AIGC

Nice to have

NLP algorithms
RL algorithms
Transformer
BERT
GPT
Impactful project leadership
Publication record

What the JD emphasized

track record of research excellence
publish findings in international conferences and journals
Proficiency in Diffusion, LLM, and other advanced large multimodal models
experience with model training, tuning, and application
Familiarity with computer vision (CV) algorithms
A history of leading impactful projects in large multimodal models or publishing in conferences (NeurIPS, ICLR, ICML, etc.) is advantageous

Other signals

large multimodal models
video understanding
video processing
video compression
model training
model tuning
computer vision
AIGC

Read full job description

Team Introduction Multimedia Lab's mission is to promote cutting-edge research in multimedia (including, but not limited to image/video data processing, compression and transmission), and to transfer technologies into our products for better serving our hundreds of millions of users. We are looking for exceptional individuals from all area of multimedia processing/compression/transmission, who have a track record of research excellence, a passion to shape the future of multimedia processing, and the potential to become an outstanding leader in the field.

Responsibilities

Contribute to the research and development of multimedia algorithms based on large multimodal models, including but not limited to video understanding, quality assessment, video processing and enhancement, and video compression.
Optimize and accelerate the performance of algorithms related to large multimodal models.
Explore the implementation of large multimodal models in multimedia applications, such as short video streaming, video transcoding, live streaming, etc.
Conduct advanced academic research on large multimodal models and publish findings in international conferences and journals.

Requirements

Minimum Qualification

Proficiency in Diffusion, LLM, and other advanced large multimodal models; experience with model training, tuning, and application.
Familiarity with computer vision (CV) algorithms, including GAN, VAE, and Diffusion for AIGC.

Preferred Qualification

Experience with NLP and RL algorithms, and knowledge of models such as Transformer, BERT, and GPT is preferred.
A history of leading impactful projects in large multimodal models or publishing in conferences (NeurIPS, ICLR, ICML, etc.) is advantageous.