Senior Research Scientist - Reinforcement Learning, Moes

Canva Canva · Enterprise · Vienna, Vienna, Austria · Information Technology

Senior Research Scientist focused on Reinforcement Learning, Mixture of Experts (MoEs), and agentic systems for multimodal AI in design. The role involves research, development, and shipping of post-training techniques, reward modeling, and agent stack components, with a strong emphasis on scaling and reliability.

What you'd actually do

  1. Develop agent systems (planning, multimodal tool use, retrieval, novel training approaches, modeling ablations) for real tasks in design, vision, and language.
  2. Scale post-training and RL across distributed systems (PyTorch) with efficient data loaders, tracing/telemetry, stable training of mixture-of-experts (MoE) architectures, and reproducible pipelines; profile, debug, and optimize.
  3. Contribute to the research agenda for RL/agentic systems aligned with Canva’s product goals; identify high‑leverage bets and retire dead ends quickly.
  4. Build reward models and learning loops: RLHF/RLAIF, preference modeling, DPO/IPO‑style objectives, offline/online RL, curriculum learning, and credit assignment for multi‑step reasoning.
  5. Develop simulation and sandbox tasks that surface failure modes (planning errors, tool‑use brittleness, hallucination, unsafe actions) and turn them into measurable targets.

Skills

Required

  • Reinforcement Learning
  • Mixture of Experts (MoEs)
  • Agentic Systems
  • Multimodal Models
  • Post-training
  • Python
  • PyTorch
  • Distributed Training
  • Experimental Design
  • Policy Optimization
  • Reward Modeling
  • Preference Learning
  • Large-scale training
  • Cloud tooling

Nice to have

  • Video modeling
  • Audio modeling
  • Multi-agent settings
  • Alignment evaluations
  • Safety evaluations
  • Red-teaming
  • Risk mitigation for tool-using agents
  • Open-source contributions
  • Benchmark contributions
  • Shared evaluation suites for agents

What the JD emphasized

  • track record of shipped research or publications in MoEs, RL or agents
  • Experience modifying, and adapting open-source models
  • Strong experience with experimental design: tight baselines, clean ablations, reproducibility, and clear, data‑backed conclusions
  • Fluency in Python and PyTorch; you’re comfortable in large ML codebases and can profile, debug, and optimize training and inference
  • Practical experience building agent loops (planning, tool invocation, retrieval, memory) and evaluating multi‑step reasoning quality
  • Hands‑on experience with policy optimization, reward modeling, and preference learning (e.g., RLHF/RLAIF, DPO/IPO, actor‑critic/PPO, offline RL)
  • Experience with large‑scale training (distributed training, experiment tracking, evaluation harnesses) and cloud multimodal tooling
  • Experience with RL for MoE architectures

Other signals

  • Reinforcement Learning
  • Mixture of Experts (MoEs)
  • Agentic Systems
  • Multimodal Models
  • Post-training