Large Model Training Acceleration Engineer

ByteDance ByteDance · Big Tech · San Jose, CA · R&D

ByteDance's Intelligent Creation - AI Platform team is looking for an experienced AI model optimization engineer to optimize large model training pipelines, develop distributed training strategies, and benchmark deep learning models. The role requires expertise in Python, C++, CUDA, deep learning frameworks (PyTorch, Megatron, Deepspeed), distributed training techniques, and knowledge of transformers and diffusion models.

What you'd actually do

  1. Optimize large model training pipelines to improve efficiency, speed, and scalability.
  2. Develop and improve distributed training strategies such as data parallelism, model parallelism, pipeline parallelism and communication to accelerate model training.
  3. Benchmark and profile deep learning models to identify performance bottlenecks and optimize computational resources.

Skills

Required

  • Python
  • C++
  • CUDA
  • PyTorch
  • Megatron
  • Deepspeed
  • data parallelism
  • model parallelism
  • pipeline parallelism
  • transformers
  • diffusion models

Nice to have

  • inference optimization

What the JD emphasized

  • AI model training optimization
  • distributed training
  • transformers and diffusion models

Other signals

  • large-scale generative AI models
  • optimizing AI model training and inference
  • distributed training/inference and acceleration