Machine Learning Engineer III

Workday Workday · Enterprise · Pleasanton, CA +2

Machine Learning Engineer III at Workday to work on the Agent Evaluation Platform and Information Retrieval products. The role involves architecting agentic AI, driving meta-ML and optimization, advancing information retrieval with semantic search and text-to-SQL/Python, scaling evaluation and observability pipelines, and owning the ML lifecycle. Focus on applied research and production deployment of AI agents and search systems.

What you'd actually do

  1. Architect Agentic AI: Design and deploy sophisticated reasoning, planning, and swarm agents that interact seamlessly with enterprise data and support continuous, life-long learning.
  2. Drive Meta-ML & Optimization: Develop algorithms for automated node-level optimization within agent graphs, identifying the best LLM and prompt configurations for every workflow step. Build recommender systems for engineering teams to drive optimal evaluation for their agents.
  3. Advance Information Retrieval: Build hybrid, agentic search systems and semantic parsing products (Text-to-SQL/Python) utilizing vector search, reasoning, and fine-tuning for structured output.
  4. Scale Evaluation & Observability: Engineer cloud-based pipelines (Kubeflow) and A/B testing frameworks for rigorous offline/online evaluation, failure attribution, and safety monitoring.
  5. Lead the ML Lifecycle: Own the end-to-end MLOps process—from exploration and prompt engineering to scalable production deployment—ensuring high-quality, reliable performance.

Skills

Required

  • 3+ years of experience researching, developing and deploying production-grade ML systems
  • expertise in deep learning, NLP, Information Retrieval, and recommender systems
  • experience with frameworks like PyTorch or TensorFlow
  • Proven track record of building and evaluating NLP and LLM-powered products
  • expertise in RAG architectures
  • expertise in agentic frameworks (e.g., LangChain/LangGraph)
  • expertise in long-context LLM applications (e.g., Text-to-SQL)
  • 2+ years of Python experience

Nice to have

  • Kubeflow
  • A/B testing frameworks
  • prompt engineering

What the JD emphasized

  • rigorous, data-driven optimization, evaluation and validation of their agents
  • rigorous offline/online evaluation, failure attribution, and safety monitoring
  • exclusive, high-integrity enterprise datasets
  • absolute frontier of Agentic AI
  • validate, scale, and optimize an agent
  • extract the correct data for agents
  • gatekeeper of quality for products reaching 31 million users

Other signals

  • building agentic AI systems
  • evaluating and optimizing agents
  • information retrieval and semantic search
  • deploying production ML systems