Applied Scientist

Zillow Zillow · Consumer · United States · Remote

The Connections AI team at Zillow is building intelligence and decisioning systems to help real estate agents and loan officers operate as a coordinated team. This role focuses on building foundational systems that convert behavioral, conversational, and operational signals into shared intelligence and actionable recommendations. The work involves real-time signal ingestion, data understanding, ranking, prediction, decisioning, and agentic execution, leveraging LLMs, agentic AI, and classical ML. The primary focus is on propensity, ranking, prediction, and decisioning systems, with selective use of Generative AI.

What you'd actually do

  1. Build and improve applied ML systems. Develop, evaluate, and improve models that help identify customer readiness, prioritize opportunities, and recommend the next best action in partner workflows.
  2. Work with large, messy, real-world data. Use behavioral, conversational, operational, and CRM data to create features, analyze performance, and improve model quality.
  3. Design strong evaluation for both ML and GenAI. Define offline and online evaluation approaches, select metrics tied to business outcomes, and help build trustworthy evaluation methods for generative AI features such as summaries or recommendations.
  4. Prototype and test new ideas. Independently explore new models and enhancements to existing models, and assess trade-offs through careful experimentation. Stay current with academic and industry advances, present key learnings internally, and translate promising ideas into practical improvements.
  5. Partner across disciplines. Work closely with product managers, engineers, and other scientists to refine problem statements, translate requirements into measurable objectives, and ship production-ready solutions.
  6. Contribute to reliable production science. Write clear, well-tested code for data preparation, training, and evaluation, and bring models to a state ready for integration into production systems.

Skills

Required

  • Python
  • common ML tooling
  • clean, tested, maintainable code
  • supervised learning
  • ranking
  • classification
  • regression
  • feature engineering
  • practical model iteration
  • designing and analyzing experiments
  • choosing metrics thoughtfully
  • spotting issues in measurement
  • applying correct evaluation methods
  • working with imperfect real-world datasets
  • text or event data
  • understanding how data quality and bias affect modeling outcomes
  • communicating assumptions, trade-offs, and results clearly
  • working effectively with engineering and product partners

Nice to have

  • NLP
  • language-data experience
  • text-heavy products or datasets
  • Generative AI evaluation experience
  • evaluating LLM-based features
  • human evaluation
  • rubric-based review
  • LLM-as-judge approaches
  • safety and quality guardrails
  • Marketplace or workflow optimization experience
  • recommendat

What the JD emphasized

  • building the AI and machine learning systems
  • building and improving models
  • design strong evaluation
  • evaluate
  • evaluation methods
  • prototype and test new ideas
  • models
  • production science
  • applied ML experience
  • applying machine learning
  • building and improving models
  • evaluation-first mindset
  • designing and analyzing experiments
  • correct evaluation methods
  • messy and unstructured data
  • modeling outcomes
  • production-ready solutions

Other signals

  • building foundational systems
  • turning signals into intelligence and recommendations
  • real-time signal ingestion
  • structured and unstructured data understanding
  • ranking and prioritization
  • explainable decisioning
  • agentic execution
  • LLMs
  • agentic AI
  • classical ML
  • cross-system data