Multimodal Model Training and Inference Optimization Engineer

ByteDance · Big Tech · Seattle, WA · R&D

Seeking an experienced engineer to optimize large-scale multimodal AI model training and inference pipelines, focusing on distributed training strategies, performance benchmarking, and acceleration for generative AI and CV/Multimodal Understanding applications.

What you'd actually do

Optimize large model training pipelines to improve efficiency, speed, and scalability.
Develop and improve distributed training strategies such as data parallelism, model parallelism, pipeline parallelism and communication to accelerate model training.
Benchmark and profile deep learning models to identify performance bottlenecks and optimize computational resources.

Skills

Required

Python
C++
CUDA
PyTorch
Megatron
Deepspeed
distributed training
transformers
diffusion models

Nice to have

M.S or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, or a related field
publications at conferences such as MLSys, NeurIPS, ICLR, or ICML
communication and teamwork skills
Self-motivated and strong problem-solving skills
implementing and optimizing complex and performance-critical systems

What the JD emphasized

expertise in optimizing AI model training and inference
distributed training/inference and acceleration
large-scale generative AI models
performance, scalability, and deployment
optimize large model training pipelines
improve efficiency, speed, and scalability
distributed training strategies
accelerate model training
Benchmark and profile deep learning models
identify performance bottlenecks
optimize computational resources
AI model training optimization
Python, C++, and CUDA
PyTorch, Megatron and Deepspeed
distributed training techniques
transformers and diffusion models

Other signals

optimizing large-scale generative AI models
distributed training/inference and acceleration
cutting edge of AI efficiency

Read full job description

About the team The Vision-Applied Research team focuses on applied research in Generative AI and CV/Multimodal Understanding, and delivering intelligent solutions to ByteDance products, e.g., TikTok, CapCut, and Lemon8, enabling users to make and share creative content in a much easier way. The team has research groups dedicated to generative models for content creation, image generation, video synthesis, intelligent image/video editing, and virtual humans.

We are seeking an experienced Multimodal Model Training and Inference Optimization Engineer with expertise in optimizing AI model training and inference, including distributed training/inference and acceleration. The ideal candidate will work at the cutting edge of AI efficiency, enhancing the performance, scalability, and deployment of large-scale generative AI models.

Responsibilities

Optimize large model training pipelines to improve efficiency, speed, and scalability.
Develop and improve distributed training strategies such as data parallelism, model parallelism, pipeline parallelism and communication to accelerate model training.
Benchmark and profile deep learning models to identify performance bottlenecks and optimize computational resources.

Requirements

Minimum Qualifications:

M.S or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, or a related field.
Experience in AI model training optimization.
Strong software engineering skills, including proficiency in Python, C++, and CUDA.
Strong proficiency in deep learning frameworks such as PyTorch, Megatron and Deepspeed.
Experience with distributed training techniques such as data parallelism, model parallelism, and pipeline parallelism.
Knowledge of transformers and diffusion models.

Preferred Qualifications:

Candidates with publications at conferences such as MLSys, NeurIPS, ICLR, or ICML are preferred
Strong communication and teamwork skills.
Self-motivated and strong problem-solving skills.
Ability to work collaboratively in multi-functional teams.
Experienced in implementing and optimizing complex and performance-critical systems.

Responsibilities

Optimize large model training pipelines to improve efficiency, speed, and scalability.
Develop and improve distributed training strategies such as data parallelism, model parallelism, pipeline parallelism and communication to accelerate model training.
Benchmark and profile deep learning models to identify performance bottlenecks and optimize computational resources.

Requirements

Minimum Qualifications:

M.S or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, or a related field.
Experience in AI model training optimization.
Strong software engineering skills, including proficiency in Python, C++, and CUDA.
Strong proficiency in deep learning frameworks such as PyTorch, Megatron and Deepspeed.
Experience with distributed training techniques such as data parallelism, model parallelism, and pipeline parallelism.
Knowledge of transformers and diffusion models.

Preferred Qualifications:

Candidates with publications at conferences such as MLSys, NeurIPS, ICLR, or ICML are preferred
Strong communication and teamwork skills.
Self-motivated and strong problem-solving skills.
Ability to work collaboratively in multi-functional teams.
Experienced in implementing and optimizing complex and performance-critical systems.