Sr Machine Learning Engineer

Workday Workday · Enterprise · Toronto, ON +1

Senior Machine Learning Engineer on the AI Core team, responsible for designing, building, and deploying novel agentic systems and core machine learning models for AI-driven applications at Workday. This role involves end-to-end MLOps, working with large-scale datasets, and applying ML and distributed systems principles in production.

What you'd actually do

  1. Lead the design, development, and deployment of novel agentic systems and core machine learning models that power AI-driven capabilities.
  2. Execute data analysis, error analysis, and rigorous experimentation to drive model improvements and new capability development.
  3. Design and implement the end-to-end machine learning pipeline (MLOps), ensuring model scalability, reliability, and consumption via robust APIs.
  4. Work with large-scale datasets to perform data wrangling, feature engineering, and validation to train and fine-tune state-of-the-art models.
  5. Apply machine learning and distributed systems principles in production to address model scalability, concurrency, fault tolerance, and performance challenges.

Skills

Required

  • advanced Python development
  • designing, training, and evaluating Machine Learning models
  • deploying them to production environments
  • building agentic systems
  • leveraging LLMs
  • retrieval-augmented generation (RAG)
  • sophisticated prompting techniques
  • PyTorch
  • TensorFlow
  • Scikit-learn
  • MLOps tools
  • data wrangling
  • feature engineering
  • data validation techniques for large-scale datasets
  • advanced Python concepts
  • asynchronous and concurrent programming
  • generators
  • higher-order abstractions
  • object-oriented design principles
  • unix systems
  • cloud platforms
  • containerized workloads
  • orchestration systems
  • AWS
  • GCP
  • Docker
  • Kubernetes

Nice to have

  • mentor and coach other engineers
  • architectural thinking skills

What the JD emphasized

  • designing, building, and scaling production ML models and systems
  • building agentic systems
  • end-to-end machine learning pipeline (MLOps)

Other signals

  • AI platform capabilities
  • agentic systems
  • enterprise-scale systems
  • production ML models and systems