Multimodal Model Training and Inference Optimization Engineer

ByteDance ByteDance · Big Tech · Seattle, WA · R&D

Seeking an experienced engineer to optimize large-scale multimodal AI model training and inference pipelines, focusing on distributed training strategies, performance benchmarking, and acceleration for generative AI and CV/Multimodal Understanding applications.

What you'd actually do

  1. Optimize large model training pipelines to improve efficiency, speed, and scalability.
  2. Develop and improve distributed training strategies such as data parallelism, model parallelism, pipeline parallelism and communication to accelerate model training.
  3. Benchmark and profile deep learning models to identify performance bottlenecks and optimize computational resources.

Skills

Required

  • Python
  • C++
  • CUDA
  • PyTorch
  • Megatron
  • Deepspeed
  • distributed training
  • transformers
  • diffusion models

Nice to have

  • M.S or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, or a related field
  • publications at conferences such as MLSys, NeurIPS, ICLR, or ICML
  • communication and teamwork skills
  • Self-motivated and strong problem-solving skills
  • implementing and optimizing complex and performance-critical systems

What the JD emphasized

  • expertise in optimizing AI model training and inference
  • distributed training/inference and acceleration
  • large-scale generative AI models
  • performance, scalability, and deployment
  • optimize large model training pipelines
  • improve efficiency, speed, and scalability
  • distributed training strategies
  • accelerate model training
  • Benchmark and profile deep learning models
  • identify performance bottlenecks
  • optimize computational resources
  • AI model training optimization
  • Python, C++, and CUDA
  • PyTorch, Megatron and Deepspeed
  • distributed training techniques
  • transformers and diffusion models

Other signals

  • optimizing large-scale generative AI models
  • distributed training/inference and acceleration
  • cutting edge of AI efficiency