Machine Learning Engineering Manager - Evaluations

Canva Canva · Enterprise · London, United Kingdom +1 · Information Technology

Machine Learning Engineering Manager to coach a team of Research Scientists and Machine Learning Engineers, build production-ready evaluation systems, and turn cutting-edge ML capabilities into delightful product experiences. Focus on owning evaluation infrastructure, building automated metrics for visual quality, and advising on human evaluation pipelines.

What you'd actually do

  1. Coaching and mentoring a high-performing team of Machine Learning Engineers and Research Scientists.
  2. Owning the evaluation infrastructure - Design, build, and maintain robust evaluation systems, quality metrics, safety monitoring, red-teaming, competitive benchmarking - to guarantee enterprise readiness and user delight at scale.
  3. Building automated metrics that reliably predict human aesthetic judgment across dimensions like visual hierarchy, layout coherence, typography, and brand alignment.
  4. Advising on human evaluation pipelines and closing the loop between user signals and model improvements.
  5. Setting technical strategy in alignment with Canva's AI and product goals.

Skills

Required

  • led machine learning engineering teams
  • coaching and delivering production systems
  • deploying and scaling generative models (Diffusion, GANs, VAEs, LLMs) in production environments
  • visual models (image, video, design)
  • building ML infrastructure, evaluation pipelines, and monitoring systems at scale
  • creating data-driven evaluation methodologies
  • turning user analytics and production metrics into clear, actionable insights
  • strong systems design skills
  • MLOps
  • model serving
  • production reliability
  • visual quality assessment
  • aesthetic modelling
  • human preference learning
  • gap between automated metrics and human raters
  • design principles (hierarchy, balance, typography, colour theory)
  • operationalise them as measurable signals
  • collaborative environments
  • communicate clearly with technical and non-technical audiences
  • SOTA research trends
  • engineering best practices

Nice to have

  • visual models (image, video, design)
  • visual quality assessment, aesthetic modelling, or human preference learning
  • gap between automated metrics and human raters
  • design principles (hierarchy, balance, typography, colour theory)
  • operationalise them as measurable signals

What the JD emphasized

  • production-ready evaluation systems
  • guarantee enterprise readiness
  • predict human aesthetic judgment
  • human evaluation pipelines
  • model improvements
  • production systems
  • visual quality assessment
  • human preference learning
  • automated metrics and human raters

Other signals

  • leading a team of ML Engineers and Research Scientists
  • owning evaluation infrastructure
  • building automated metrics that predict human aesthetic judgment
  • advising on human evaluation pipelines
  • setting technical strategy in alignment with AI and product goals
  • guiding engineering direction across model deployment, evaluation infrastructure, and production systems
  • partnering cross-functionally to ensure ML capabilities translate into reliable product impact
  • deploying and scaling generative models (Diffusion, GANs, VAEs, LLMs) in production environments with a strong focus on visual models
  • building ML infrastructure, evaluation pipelines, and monitoring systems at scale
  • creating data-driven evaluation methodologies
  • strong systems design skills and experience with MLOps, model serving, and production reliability
  • experience with visual quality assessment, aesthetic modelling, or human preference learning
  • understand design principles well enough to operationalise them as measurable signals