Research Scientist - Foundation Model, Speech Understanding

ByteDance ByteDance · Big Tech · San Jose, CA · R&D

Research Scientist focused on foundation models for speech understanding, with a focus on pre-training and fine-tuning. The role involves research and development, collaboration with cross-functional teams, and integration of research findings into practical applications. The team works on multimodal speech technologies, including ASR, speech translation, self-supervised learning, and LLM pre-training/fine-tuning.

What you'd actually do

  1. Conduct research and development in speech/audio foundation models
  2. Collaborate with cross-functional teams to identify key research areas and contribute to the development of innovative speech/audio models.
  3. Work with product development teams to integrate research findings into practical applications for ByteDance and other platforms.
  4. Collaborate on team-driven projects to address complex challenges and enhance the overall effectiveness of the research team.

Skills

Required

  • Master's or PhD in computer science, mathematics, engineering or related field
  • 3+ years of experience in machine learning and deep learning
  • Automatic Speech Recognition
  • Automatic Speech Translation
  • Speech/audio self-supervised learning and foundation models
  • Speaker recognition and verification
  • Speech emotion recognition
  • Multimodal foundation models
  • Large Language Model pre-training and fine-tuning
  • Python
  • C++

Nice to have

  • Publications in accredited ML/DL venues
  • Deep understanding of Large Language models
  • Distributed computing and large scale model training
  • Tensorflow
  • Pytorch
  • Engineering principles and best practices
  • Algorithms and programming

What the JD emphasized

  • foundation models
  • speech understanding
  • multimodal

Other signals

  • foundation models
  • speech understanding
  • multimodal