Multimodal Model Training and Inference Optimization Engineer

ByteDance ByteDance · Big Tech · San Jose, CA · R&D

ByteDance is looking for an experienced Multimodal Model Training and Inference Optimization Engineer to join their Vision-Applied Research team. This role focuses on optimizing AI model training and inference, including distributed training/inference and acceleration, to enhance the performance, scalability, and deployment of large-scale generative AI models for ByteDance products like TikTok and CapCut. The ideal candidate will have expertise in optimizing AI model training, distributed training strategies, and benchmarking deep learning models.

What you'd actually do

  1. Optimize large model training pipelines to improve efficiency, speed, and scalability.
  2. Develop and improve distributed training strategies such as data parallelism, model parallelism, pipeline parallelism and communication to accelerate model training.
  3. Benchmark and profile deep learning models to identify performance bottlenecks and optimize computational resources.

Skills

Required

  • Python
  • C++
  • CUDA
  • PyTorch
  • Megatron
  • Deepspeed
  • transformers
  • diffusion models

Nice to have

  • publications at conferences such as MLSys, NeurIPS, ICLR, or ICML
  • implementing and optimizing complex and performance-critical systems

What the JD emphasized

  • AI model training optimization
  • distributed training techniques

Other signals

  • optimizing large-scale generative AI models
  • distributed training/inference
  • acceleration
  • performance, scalability, and deployment