Machine Learning Engineer (training Optimization)

Canva Canva · Enterprise · Beijing, Beijing, China · Engineering

Machine Learning Engineer focused on optimizing the training system for large-scale multimodal and foundation models, involving distributed systems, performance tuning, and low-level optimization.

What you'd actually do

  1. You’ll design, implement, and optimize large-scale machine learning systems for training
  2. You’ll improve all aspects of performance, including GPU utilization, communication overhead, and memory efficiency.
  3. You’ll partner with research and modeling teams to align systems with algorithmic needs.
  4. You’ll evaluate and apply best practices for distributed training using industry-leading frameworks.
  5. You’ll dive deep into low-level optimization, including custom CUDA or Triton kernels.
  6. You’ll debug, profile, and fine-tune training workflows to unlock new levels of scalability.

Skills

Required

  • Python
  • PyTorch or JAX
  • Megatron-LM, NeMo, or DeepSpeed
  • FSDP/ZeRO, gradient checkpointing, or low-precision data types
  • custom GPU kernels in CUDA or Triton
  • English

Nice to have

  • C++ or Rust
  • LLMs, multimodal AI, or diffusion models

What the JD emphasized

  • large-scale machine learning systems for training
  • distributed training
  • low-level optimization
  • custom CUDA or Triton kernels
  • debug, profile, and fine-tune training workflows

Other signals

  • large-scale multimodal and foundation models
  • distributed training systems
  • pushing the limits of performance