Machine Learning Engineer III / Senior Machine Learning Engineer - AI Platform

Workday Workday · Enterprise · Toronto, ON +2

Workday is seeking a Machine Learning Engineer to build their Agent Platform, focusing on the infrastructure for hosting and scaling LLM-powered agent applications. The role involves developing systems for agent execution, orchestration, observability, evaluation, and reliability, working closely with applied AI, product, and infrastructure teams to define how agents are built and operated.

What you'd actually do

  1. Design and build the core platform capabilities required to develop, host, and operate AI agents at scale.
  2. Develop infrastructure and services for agent execution, orchestration, state management, and runtime reliability.
  3. Build reusable abstractions, frameworks, and workflows in Python to support agent development patterns across teams.
  4. Design and implement systems for tool use, memory, retrieval, workflow coordination, and human-in-the-loop interactions.
  5. Build and maintain services deployed on Kubernetes, with a focus on scalability, resiliency, and operational excellence.

Skills

Required

  • Python
  • distributed systems
  • APIs
  • asynchronous workflows
  • service-oriented architecture
  • scalability
  • reliability
  • observability
  • maintainability
  • Kubernetes

Nice to have

  • agent platforms
  • AI infrastructure
  • internal developer platforms
  • machine learning or LLM-powered applications in production
  • Tool calling
  • Retrieval-augmented generation (RAG)
  • Memory and context management
  • Multi-step workflows and orchestration
  • Human-in-the-loop systems
  • evaluation frameworks for LLM or agent quality

What the JD emphasized

  • agent-based applications powered by LLMs
  • agent execution
  • workflow orchestration
  • observability
  • evaluation
  • reliability
  • developer experience
  • tool use
  • memory
  • retrieval
  • workflow coordination
  • human-in-the-loop interactions
  • scalability
  • resiliency
  • operational excellence
  • evaluation frameworks for LLM or agent quality

Other signals

  • building agent platforms
  • LLM-powered applications
  • agent execution
  • workflow orchestration