Research Scientist, Intelligent Editing (multimodality)

ByteDance ByteDance · Big Tech · Seattle, WA · R&D

Research Scientist role focusing on multimodal understanding, vision and language, large-scale training, and RLHF for intelligent editing within ByteDance's Intelligent Creation Team. The role involves cutting-edge research and transferring technologies to products.

What you'd actually do

  1. Conduct cutting-edge research and development in computer vision and machine learning, especially in the areas of multi-modal understanding, vision and language, large-scale training, etc.
  2. Transfer advanced technologies to ByteDance products;
  3. Explore new products with artificial intelligence technology at its core.

Skills

Required

  • computer vision
  • machine learning
  • multimodal understanding
  • vision and language
  • language models
  • large-scale training
  • RLHF
  • C++
  • Python

Nice to have

  • publications in top-tier conferences (CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, ACL, EMNLP, COLING)
  • video highlight detection and slicing
  • audio/music understanding
  • image/video captioning
  • retrieval
  • VQA
  • intelligent editing

What the JD emphasized

  • multimodal understanding
  • vision and language
  • large-scale training
  • RLHF

Other signals

  • multimodal understanding
  • vision and language
  • large-scale training
  • RLHF