Manager, Software Development Engineering - AI Platform

Workday Workday · Enterprise · Vancouver, BC +1

Engineering Manager to lead a team building the Agent Platform, which is the core infrastructure for developing, deploying, orchestrating, and operating AI agents in production. The role involves leading engineers on backend services, distributed systems, and developer tooling, driving execution, shaping technical direction, and building a high-performing team.

What you'd actually do

  1. Lead, mentor, and grow a team of engineers building the Agent Platform.
  2. Drive execution of the team’s roadmap, ensuring high-quality, timely delivery of platform capabilities.
  3. Partner with engineers to design and build systems for agent execution, orchestration, lifecycle management, and reliability.
  4. Collaborate with cross-functional partners (product, AI/ML, infrastructure) to define requirements and prioritize investments.
  5. Contribute to technical architecture and design decisions, balancing short-term delivery with long-term scalability.

Skills

Required

  • 5+ years of software engineering experience
  • 2+ years in a people management role
  • 4+ years of building and operating production-grade backend or platform systems
  • 4+ years of technical background in distributed systems and scalable service architecture
  • 4+ years experience working with Python (or similar languages such as Java or Go)

Nice to have

  • Experience leading teams working on platforms, infrastructure, or developer tooling
  • Experience delivering complex technical projects in ambiguous, fast-moving environments
  • Familiarity with AI/ML systems or LLM-powered applications in production
  • Familiarity with Kubernetes and cloud-native systems
  • Track record of building and growing high-performing engineering teams
  • Experience with workflow orchestration, distributed pipelines, or complex multi-step systems
  • Strong ability to partner across teams and influence technical direction
  • Understanding of observability, reliability engineering, and production operations
  • Experience defining technical strategy and long-term platform vision
  • Excellent communication, prioritization, and execution skills
  • Experience hiring and scaling teams in a high-growth environment

What the JD emphasized

  • production-grade backend or platform systems
  • distributed systems
  • scalable service architecture
  • agent execution
  • agent orchestration
  • lifecycle management
  • reliability
  • AI/ML systems
  • LLM-powered applications in production
  • Kubernetes
  • cloud-native systems
  • workflow orchestration
  • distributed pipelines
  • complex multi-step systems
  • observability
  • reliability engineering
  • production operations

Other signals

  • building foundational platforms for emerging technologies
  • agent platform
  • AI agents in production