Principal Software Engineer, Applied AI Services

Zillow · Consumer · Mexico City, Mexico

Principal Software Engineer to lead the architecture and evolution of backend and AI systems, focusing on integrating AI/ML/LLM capabilities into production services. The role involves building pipelines, services, and evaluation frameworks, and defining system-wide standards for AI-powered features.

What you'd actually do

Architect end‑to‑end applied AI services that connect offline data ingestion, AI/ML/LLM workflows, and online services and APIs, defining shared patterns for batch and streaming data pipelines (e.g., Databricks, Spark, Kafka or equivalents), feature and signal stores, and evaluation and guardrail frameworks for AI‑powered capabilities.
Create reusable building blocks—such as libraries, templates, and reference implementations—that make it straightforward for product teams to integrate AI into their services and ship AI‑powered features faster.
Drive AI/ML and LLM‑powered systems from prototype to production, including ingestion and transformation of training and inference data, integration of models and LLMs into online decision flows and APIs, and the definition of evaluation methodologies, metrics, and regression gates (e.g., LLM‑as‑judge, offline/online evaluation, human‑in‑the‑loop review loops).
Partner with AI/ML, Agentic AI, and data platform teams to clarify ownership boundaries and interfaces (for example, around cross‑cutting evaluation capabilities such as Evaluate MCP), and to ensure AI systems remain measurable, debuggable, and reproducible as they scale.
Lead multi‑team technical initiatives that span SJS, AI/ML teams, HDP, and other backend groups, defining and rolling out system‑wide standards and abstractions for APIs and contracts (REST/GraphQL, events, DRDCs), data schemas and lineage across offline and online paths, and observability, evaluation, and operational runbooks for AI‑powered services.

Skills

Required

10+ years of software engineering experience
delivering and scaling complex, distributed backend systems
large‑scale microservices
event‑driven architectures
cloud environments (AWS or equivalent)
Kubernetes
databases, caching, and data‑intensive services
schema design, performance optimization, and reliability
data pipelines and ML workflows (e.g., Databricks, Spark, Kafka, Airflow or equivalents)
system‑wide abstractions, frameworks, or platforms
building or scaling AI/ML or LLM‑powered systems in production
integrating models or LLMs into production services
owning or co‑owning data ingestion, feature pipelines, or model‑serving paths
defining or implementing evaluation and guardrail mechanisms
led cross‑team technical initiatives as an IC

Nice to have

AI/ML/LLM workflows
evaluation and guardrail frameworks
agentic systems
LLM‑as‑judge
offline/online evaluation
human‑in‑the‑loop review loops
Agentic AI
data platform teams
APIs and contracts (REST/GraphQL, events, DRDCs)
data schemas and lineage
observability
operational runbooks
background agents for KTLO
AI‑assisted design, implementation, and testing patterns

What the JD emphasized

lead the architecture and evolution of our backend and AI systems
operate at the intersection of large‑scale backend systems
applied AI/ML systems
build the pipelines, services, and evaluation capabilities
AI‑powered experiences
AI is both a product capability and an engineering accelerator
investing in LLM‑ and ML‑powered workflows, evaluation, and agentic systems
bring AI capabilities into production services safely and repeatably
evolve our user intent flywheel and HDP backend platform
define system‑wide standards, frameworks, and abstractions
enable many teams to ship AI‑powered features faster and with higher quality
hands‑on principal IC role
design and build systems yourself
provide technical leadership across multiple teams and orgs
architect end‑to‑end applied AI services
AI/ML/LLM workflows
evaluation and guardrail frameworks for AI‑powered capabilities
integrate AI into their services and ship AI‑powered features faster
Drive AI/ML and LLM‑powered systems from prototype to production
integration of models and LLMs into online decision flows and APIs
definition of evaluation methodologies, metrics, and regression gates
Partner with AI/ML, Agentic AI, and data platform teams
ensure AI systems remain measurable, debuggable, and reproducible
Lead multi‑team technical initiatives
system‑wide standards and abstractions
observability, evaluation, and operational runbooks for AI‑powered services
Mentor senior engineers
champion the use of AI as a force multiplier for engineering
background agents for KTLO
AI‑assisted design, implementation, and testing patterns
define which workflows should be agent‑assisted versus human‑led
built large‑scale microservices and event‑driven architectures
data pipelines and ML workflows
how they connect to online systems
designed system‑wide abstractions, frameworks, or platforms
building or scaling AI/ML or LLM‑powered systems in production
integrating models or LLMs into production services
owning or co‑owning data ingestion, feature pipelines, or model‑serving paths
defining or implementing evaluation and guardrail mechanisms
led cross‑team technical initiatives as an IC

Other signals

building pipelines and services for AI-powered experiences
integrating models and LLMs into production services
defining evaluation methodologies and guardrail frameworks
leading cross-team technical initiatives for AI systems

Read full job description

About the team

Shopper Journey Services (SJS) is Zillow’s engine for authenticated user intent and a core enabler of Home Details Page (HDP) experiences. We gather rich, explicit and implicit signals from authenticated shoppers and co‑shoppers, refine them into durable models, services, and display‑ready data, and activate those capabilities across Zillow’s ecosystem to deliver personalized, high‑quality experiences. Our charter spans two tightly connected areas: User Intent & Applied AI, where we build the pipelines, services, and evaluation capabilities that turn user signals into intelligence and AI‑powered experiences, and the HDP Backend Platform, where we power HDP with robust, low‑latency, display‑ready data. AI is both a product capability and an engineering accelerator for this team, and we are investing in LLM‑ and ML‑powered workflows, evaluation, and agentic systems while modernizing our conventional backend stack.

About the role

We are looking for a P5 Principal Software Engineer, Applied AI Services to lead the architecture and evolution of our backend and AI systems across Shopper Journey Services and partner teams. In this cross‑cutting role, you will operate at the intersection of large‑scale backend systems (microservices, APIs, data stores, event‑driven architectures, ZGCP/Kubernetes, AWS‑class cloud infrastructure) and applied AI/ML systems (offline data ingestion, feature and signal pipelines, LLM/ML‑powered capabilities, evaluation frameworks, and AI‑driven workflows). You will play a leading role in how we bring AI capabilities into production services safely and repeatably, evolve our user intent flywheel and HDP backend platform, and define system‑wide standards, frameworks, and abstractions that enable many teams to ship AI‑powered features faster and with higher quality. This is a hands‑on principal IC role where you will both design and build systems yourself and provide technical leadership across multiple teams and orgs, without direct reporting lines.

You Will Get To

Architect end‑to‑end applied AI services that connect offline data ingestion, AI/ML/LLM workflows, and online services and APIs, defining shared patterns for batch and streaming data pipelines (e.g., Databricks, Spark, Kafka or equivalents), feature and signal stores, and evaluation and guardrail frameworks for AI‑powered capabilities.
Create reusable building blocks—such as libraries, templates, and reference implementations—that make it straightforward for product teams to integrate AI into their services and ship AI‑powered features faster.
Lead conventional backend and platform excellence by architecting and guiding high‑scale microservices in a Kubernetes environment, driving patterns for event‑driven architectures (including event schemas, contracts, and consumption patterns), and setting standards for databases, caching, and data access that support both HDP backend and AI use cases.
Drive AI/ML and LLM‑powered systems from prototype to production, including ingestion and transformation of training and inference data, integration of models and LLMs into online decision flows and APIs, and the definition of evaluation methodologies, metrics, and regression gates (e.g., LLM‑as‑judge, offline/online evaluation, human‑in‑the‑loop review loops).
Partner with AI/ML, Agentic AI, and data platform teams to clarify ownership boundaries and interfaces (for example, around cross‑cutting evaluation capabilities such as Evaluate MCP), and to ensure AI systems remain measurable, debuggable, and reproducible as they scale.
Lead multi‑team technical initiatives that span SJS, AI/ML teams, HDP, and other backend groups, defining and rolling out system‑wide standards and abstractions for APIs and contracts (REST/GraphQL, events, DRDCs), data schemas and lineage across offline and online paths, and observability, evaluation, and operational runbooks for AI‑powered services.
Mentor senior engineers, run deep design reviews, and champion the use of AI as a force multiplier for engineering—such as background agents for KTLO (library updates, security posture, config drift) and AI‑assisted design, implementation, and testing patterns—helping define which workflows should be agent‑assisted versus human‑led in safe, observable, and cost‑effective ways.

This role has been categorized as a teleworker position. Teleworkers do not have a permanent corporate office workplace and instead work from a physical location of their choice, which must be identified to the Company. Employees may live anywhere in Mexico, with availability to travel to Mexico City, as we recommend attendance at occasional office events.

In addition to a competitive base salary and benefits, this position is also eligible for equity awards based on factors such as experience, performance and location.

Who you are

You have 10+ years of software engineering experience with a strong track record of delivering and scaling complex, distributed backend systems in large engineering organizations.
You have built large‑scale microservices and event‑driven architectures in cloud environments (AWS or equivalent), ideally including Kubernetes, and you bring strong expertise in databases, caching, and data‑intensive services, including schema design, performance optimization, and reliability.
You are experienced with data pipelines and ML workflows (e.g., Databricks, Spark, Kafka, Airflow or equivalents) and how they connect to online systems, and you have designed system‑wide abstractions, frameworks, or platforms that are used by multiple teams.
You have hands‑on experience building or scaling AI/ML or LLM‑powered systems in production, including integrating models or LLMs into production services, owning or co‑owning data ingestion, feature pipelines, or model‑serving paths, and defining or implementing evaluation and guardrail mechanisms.
You have led cross‑team technical initiatives as an IC, influencing architecture and standards beyond a single team, and you communicate clearly with engineers, product partners, and leadership to drive clarity and alignment in ambiguous problem spaces.
You are familiar with personalization, recommendations, or user intent modeling domains, and you have experience or interest in LLM‑based workflows, including prompt design, evaluation strategies, and safety and guardrail patterns.
You have experience with modern API and integration layers, such as GraphQL or similar patterns that sit between backend services and user‑facing clients, and you have helped modernize legacy services and data paths into cohesive, platform‑aligned systems.
You care deeply about reliability, observability, and operational excellence, and you design systems for long‑term maintainability, debuggability, and measurability from the start.

Get to know us

At Zillow, we’re reimagining how people move—through the real estate market and through their careers. As the most-visited real estate platform in the U.S., we help customers navigate buying, selling, financing and renting with greater ease and confidence. Whether you're working in tech, sales, operations, or design, you’ll be part of a company that's reshaping an industry and helping more people make home a reality.

Zillow is honored to be recognized among the best workplaces in the country. Zillow was named one of FORTUNE 100 Best Companies to Work For® in 2025, and included on the PEOPLE Companies That Care® 2025list, reflecting our commitment to creating an innovative, inclusive, and engaging culture where employees are empowered to grow.

No matter where you sit in the organization, your work will help drive innovation, support our customers, and move the industry—and your career—forward, together.

Zillow Group is an equal opportunity employer committed to fostering an inclusive, innovative environment with the best employees. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. If you have a disability or special need that requires accommodation, please contact your recruiter directly.

Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable state and local law.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.