About the Team Established in 2023, the ByteDance Seed team is dedicated to pioneering new paths toward artificial general intelligence. We aspire to advance the frontier of intelligence to drive progress for both technology and society.
With a long-term vision for the AI sector, the Seed team's research spans MLLM, GenMedia, AI for Science, and Robotics. We maintain a global presence with laboratories and career opportunities across China, Singapore, and the United States. To date, we have launched industry-leading general foundation models and cutting-edge multimodal capabilities. Our technology powers over 50 application scenarios — including Doubao, Jimeng, TRAE, Dola and Dreamnia — and serves enterprise customers through Volcano Engine and BytePlus. Third-party data shows that the Doubao App ranks first in user volume in the Chinese market, while Doubao foundation models lead the industry in average daily token consumption.
The mission of the Seed Speech team is to enrich interactive and creative processes through the application of multimodal speech technologies. The team focuses on the forefront of research and product development in speech and audio, music, natural language understanding, and multimodal deep learning.
Responsibilities
- Conduct research and development in speech/audio foundation models
- Collaborate with cross-functional teams to identify key research areas and contribute to the development of innovative speech/audio models.
- Work with product development teams to integrate research findings into practical applications for ByteDance and other platforms.
- Collaborate on team-driven projects to address complex challenges and enhance the overall effectiveness of the research team.
Requirements
Minimum Qualifications
- Master's or PhD in computer science, mathematics, engineering or related field
- Have 3+ years of experience in one or more areas of machine learning and deep learning, including but not limited to: Automatic Speech Recognition, Automatic Speech Translation, Speech/audio self-supervised learning and foundation models, Speaker recognition and verification, Speech emotion recognition, Multimodal foundation models, Large Language Model pre-training and fine-tuning.
Preferred Qualifications
- Publications in accredited ML/DL venues such as NeurIPS, ICLR, ICML, AAAI and speech venues such as ICASSP, ASRU, Interspeech
- Deep understanding of Large Language models
- Familiar with distributed computing and large scale model training
- Familiar with deep learning frameworks such as Tensorflow and Pytorch.
- Familiar with engineering principles and best practices.
- Highly competent in algorithms and programming; Strong coding skills in C/C++ and Python.
- Ability to work collaboratively in a fast-paced, multi-functional environments