Senior/principal Machine Learning Engineer

Workday Workday · Enterprise · Pleasanton, CA +1

Workday is seeking a Senior/Principal Machine Learning Engineer to join their Agent Factory team. This role focuses on designing and building the core ML systems for next-generation AI agents, involving LLM-powered agents, RAG pipelines, and workflow orchestration. The position requires end-to-end ownership of production ML systems, from problem framing to deployment and monitoring, with a strong emphasis on scalability, observability, and enterprise readiness. The role involves close collaboration with software engineers and product managers to integrate agents into Workday's platform.

What you'd actually do

  1. design and build the core ML systems behind Workday’s next generation of AI agents
  2. implement and evolve frameworks for LLM-powered agents, including RAG pipelines, workflow orchestration, evaluation, and feedback loops, ensuring solutions are scalable, observable, and enterprise-ready
  3. stay hands-on with emerging techniques in agentic architectures while applying strong engineering judgment to turn them into systems that are reliable, explainable, and built to operate at global scale
  4. partnering closely with software engineers, product managers, and data scientists to integrate agents deeply into the Workday stack
  5. own how models, agent logic, and orchestration layers come together in production—across the full lifecycle from problem framing and data strategy to deployment, monitoring, and continuous improvement

Skills

Required

  • design and build core ML systems
  • implement and evolve frameworks for LLM-powered agents
  • RAG pipelines
  • workflow orchestration
  • evaluation
  • feedback loops
  • scalable solutions
  • observable solutions
  • enterprise-ready solutions
  • emerging techniques in agentic architectures
  • strong engineering judgment
  • reliable systems
  • explainable systems
  • systems built to operate at global scale
  • ML and platform engineering
  • cloud computing platforms (AWS, GCP)
  • large language models (LLMs)
  • text generation models
  • graph neural network models
  • machine learning and deep learning frameworks (Pytorch, TensorFlow)
  • building services to host machine learning models in production
  • leading, mentoring, and/or managing ML Engineering teams
  • ownership of development lifecycle
  • sprint planning
  • collaboration
  • transparency
  • innovation
  • continuous improvement
  • statistical analysis
  • unsupervised and supervised machine learning algorithms
  • natural language processing
  • information retrieval
  • recommendation system use cases
  • independently solving ambiguous, open-ended problems
  • technically leading teams
  • interpersonal and communication skills

Nice to have

  • PhD preferred
  • Master's preferred

What the JD emphasized

  • production-grade AI
  • deeply embedded into Workday’s platform
  • production-based evaluation
  • building services to host machine learning models in production at scale
  • enterprise-ready

Other signals

  • production-grade AI
  • intelligent agents
  • LLM-powered agents
  • enterprise-ready