Research Scientist in Large Multimodal Models Applications - San Jose

ByteDance ByteDance · Big Tech · San Jose, CA · R&D

Research Scientist role focusing on applying large multimodal models to multimedia applications like video understanding, processing, and compression. Involves model training, tuning, and performance optimization, with a strong emphasis on academic research and publication.

What you'd actually do

  1. Contribute to the research and development of multimedia algorithms based on large multimodal models, including but not limited to video understanding, quality assessment, video processing and enhancement, and video compression.
  2. Optimize and accelerate the performance of algorithms related to large multimodal models.
  3. Explore the implementation of large multimodal models in multimedia applications, such as short video streaming, video transcoding, live streaming, etc.
  4. Conduct advanced academic research on large multimodal models and publish findings in international conferences and journals.

Skills

Required

  • Diffusion models
  • LLMs
  • Large multimodal models
  • Model training
  • Model tuning
  • Model application
  • Computer vision algorithms
  • GAN
  • VAE
  • AIGC

Nice to have

  • NLP algorithms
  • RL algorithms
  • Transformer
  • BERT
  • GPT
  • Impactful project leadership
  • Publication record

What the JD emphasized

  • track record of research excellence
  • publish findings in international conferences and journals
  • Proficiency in Diffusion, LLM, and other advanced large multimodal models
  • experience with model training, tuning, and application
  • Familiarity with computer vision (CV) algorithms
  • A history of leading impactful projects in large multimodal models or publishing in conferences (NeurIPS, ICLR, ICML, etc.) is advantageous

Other signals

  • large multimodal models
  • video understanding
  • video processing
  • video compression
  • model training
  • model tuning
  • computer vision
  • AIGC