Principal Machine Learning Engineer

Zillow Zillow · Consumer · United States · Remote

Principal Machine Learning Engineer to set technical direction and architect/ship large-scale agentic data platform capabilities, including context engineering, agentic memory, and AI workflows. Role involves leading cross-organization execution, mentoring engineers, and ensuring production-grade defaults like observability and evaluation.

What you'd actually do

  1. Set the technical direction. Define and own the multi-quarter architecture roadmap for the agentic data foundations (Context engineering, Agentic memory, and AI workflows) that power Zillow's agentic experiences.
  2. Architect and ship at scale. Design, prototype, and ship systems that handle hundreds of millions of agent interactions with high availability, low latency, and predictable cost. Stay hands-on in code and production when it matters.
  3. Drive cross-organization execution. Lead complex, multi-team initiatives across Agentic AI and Platform teams - aligning on architecture, surfacing dependencies, and driving outcomes through influence rather than direct authority.
  4. Communicate to every level. Translate complex platform trade-offs, ambiguous customer problems, and emerging agentic paradigms into clear, actionable insights for engineering peers, product partners, Directors, and VPs
  5. Grow senior technical talent. Mentor Senior and Staff engineers, raise the bar on technical judgment and architecture decisions, and shape the engineering culture of the org.

Skills

Required

  • 10+ years building, scaling, and operating large-scale data and ML infrastructure
  • 1-2 years shipping agent-based or LLM-powered systems to production
  • 3+ years as a technical leader spanning multiple organizations
  • Hands-on experience designing and shipping agentic AI in production
  • Platform engineering background in scaling and abstracting large-scale data and ML infrastructure
  • Expert in distributed systems architecture
  • Expert-level Python
  • deep experience with agentic frameworks (LangGraph, LangChain, Agents SDK, AutoGen)
  • large-scale data processing (Spark, Databricks, Airflow, Temporal)
  • vector stores
  • cloud Infrastructure (AWS preferred)
  • Cross-organization leadership and communication

Nice to have

  • Advanced degree (M.S. or Ph.D.) in Computer Science, Machine Learning, or a related field, with emphasis on building distributed systems and AI
  • Experience building data platform for agentic systems or real‑time AI applications
  • Experience working with regulated, privat

What the JD emphasized

  • shipping agent-based or LLM-powered systems to production
  • agentic systems expertise
  • orchestration, tool use, memory and context engineering, retrieval (embeddings, hybrid search, ranking) and evaluation
  • how LLM-based systems fail in production and how to engineer around it
  • Platform Fluency
  • Expert-level Python
  • deep experience with agentic frameworks (LangGraph, LangChain, Agents SDK, AutoGen)
  • large-scale data processing (Spark, Databricks, Airflow, Temporal)
  • vector stores
  • cloud Infrastructure (AWS preferred)

Other signals

  • building agentic systems at scale
  • defining architecture roadmap for agentic data foundations
  • shipping systems that handle hundreds of millions of agent interactions
  • mentoring senior technical talent