Senior Machine Learning Engineer - Multimodal Data

Canva Canva · Enterprise · London, United Kingdom +1 · Information Technology

Canva is seeking a Senior Machine Learning Engineer to own the data lifecycle for their multimodal agent research. This role involves designing and building data pipelines, infrastructure for data processing and retrieval, and tooling for dataset construction, including human annotation and synthetic data generation. The engineer will collaborate with researchers to define data needs, ensure data quality, and contribute to the development of scalable training and evaluation loops for multimodal agentic systems.

What you'd actually do

  1. Design and build data pipelines for agent training: collection, filtering, deduplication, formatting, and versioning across text, image, and multimodal sources.
  2. Build and maintain infrastructure for efficient data loading, storage, and retrieval at scale (S3, distributed systems, streaming pipelines).
  3. Collaborate with research scientists to translate research requirements into concrete data specifications, and iterate as experiments reveal new needs.
  4. Create evaluation datasets and benchmarks in collaboration with researchers—curating task distributions that surface real failure modes.
  5. Develop tooling for dataset construction—including human annotation workflows, synthetic data generation, and preference data collection for RLHF/DPO-style training.

Skills

Required

  • Python
  • production-grade data pipelines
  • ML DevOps
  • prompt engineering
  • LLM/VLM outputs
  • ML data workflows
  • large-scale data processing and loading
  • data versioning
  • format considerations for training
  • large-scale distributed ML training runs
  • annotation tooling
  • human-in-the-loop data collection
  • ML training requirements
  • loading and writing large datasets to/from cloud infrastructure (AWS)
  • distributed storage systems
  • communication skills
  • collaborative approach
  • ownership
  • iterating quickly

Nice to have

  • Ray
  • Label Studio
  • preference data collection for RLHF or reward modelling
  • multimodal data (image-text pairs, video, design assets)
  • synthetic data generation pipelines using LLMs
  • data quality metrics and monitoring systems
  • Contributions to dataset releases or benchmarks in the ML community

What the JD emphasized

  • multimodal agent research
  • data foundations
  • training pipelines
  • datasets
  • tooling
  • scalable training and evaluation loops
  • multimodal agentic systems
  • data lifecycle
  • collection and curation
  • preprocessing
  • quality assurance
  • delivery into training pipelines
  • design and build the systems
  • reliably and at scale
  • significant autonomy
  • data problems
  • aligning on what problems matter most
  • agent training
  • text, image, and multimodal sources
  • efficient data loading, storage, and retrieval at scale
  • distributed systems
  • streaming pipelines
  • translate research requirements into concrete data specifications
  • iterate as experiments reveal new needs
  • evaluation datasets and benchmarks
  • task distributions
  • real failure modes
  • dataset construction
  • human annotation workflows
  • synthetic data generation
  • preference data collection
  • RLHF/DPO-style training
  • data quality
  • validation frameworks
  • monitor for drift and contamination
  • establish standards
  • datasets trustworthy and reproducible
  • Document datasets thoroughly
  • provenance
  • known limitations
  • intended use cases
  • versioning history
  • comprehensive test coverage
  • data pipelines and ML workflows
  • reliability
  • catching regressions early
  • Elevate codebase quality
  • code reviews
  • refactoring
  • establishing engineering best practices
  • research velocity scale sustainably
  • team roadmaps
  • identifying data bottlenecks
  • proposing solutions
  • unblock research velocity
  • Strong software engineering skills in Python
  • production-grade data pipelines
  • ML DevOps
  • prompt engineering
  • designing, testing, and refining prompts
  • reliable LLM/VLM outputs
  • ML data workflows
  • large-scale data processing and loading
  • Ray, or similar
  • data versioning
  • format considerations for training
  • tokenization, batching, sharding
  • data pipelines for large-scale distributed ML training runs
  • annotation tooling
  • human-in-the-loop data collection
  • Label Studio or internal systems
  • ML training requirements
  • good data
  • LLM/VLM fine-tuning
  • anticipate downstream issues
  • loading and writing large datasets
  • cloud infrastructure (AWS)
  • distributed storage systems
  • Strong communication skills
  • work with researchers
  • scope ambiguous problems
  • translate needs into actionable plans
  • collaborative approach
  • comfortable taking ownership
  • iterating quickly
  • preference data collection for RLHF or reward modelling
  • multimodal data
  • image-text pairs
  • design assets
  • synthetic data generation pipelines using LLMs
  • data quality metrics and monitoring systems
  • Contributions to dataset releases or benchmarks in the ML community

Other signals

  • multimodal agent research
  • data foundations
  • training pipelines
  • datasets
  • tooling