Senior Machine Learning Engineer at Expedia

What you'd actually do

Design and own high-throughput, low-latency ML systems (2000+ RPS) for TravelAds, including multi-service training and serving architectures, auction and ranking models, and real-time inference services that meet strict sub-100ms SLAs.

Build and evolve ML infrastructure and data foundations – feature stores, online/offline feature pipelines, embedding and vector services, and data lineage and versioning – that power ad relevance, bidding optimization, experimentation, and model evaluation at scale.

Accelerate the end-to-end ML lifecycle by automating training, validation, deployment, shadow testing, A/B testing, and retraining using orchestrated workflows (e.g., Flyte, Airflow) and robust quality gates.

Develop agentic AI and LLM/RAG-powered workflows that automate ML operations (training, deployment, validation, monitoring, calibration) and enable AI-assisted dataset creation, operational analysis, and decision support.

Define and implement ML observability, reliability, and cost guardrails through drift and feature-freshness monitoring, health dashboards, SLO/SLI definitions, incident response, and resilience-focused improvements.

Skills

Required

Python
Java/Kotlin/Scala
distributed systems
data structures
performance optimization
system design (HLD/LLD)
serving stacks
monitoring and observability
rollbacks
operational rigor
technical design for multi-quarter ML projects
partnering with Product and business stakeholders

Nice to have

real-time ML inference at high throughput
Spark
Hive
Databricks
Airflow
Flyte
AWS SageMaker
EKS
EMR
Docker
CI/CD for ML
automated training pipelines
deployment orchestration
data lineage and versioning
drift detection
feature-freshness monitoring
model health dashboards
offline/online parity validation
incident response
root cause analysis
LLM productionization
RAG architectures
agentic AI workflows

What the JD emphasized

high-throughput, low-latency ML systems

sub-100ms SLAs

end-to-end ML lifecycle

agentic AI and LLM/RAG-powered workflows

ML observability, reliability, and cost guardrails

Proven track record of designing, building, and operating production ML or large-scale distributed systems

real-time ML inference at high throughput (1000+ RPS or more) and strict latency SLAs

LLM productionization, RAG architectures, or agentic AI workflows

Expedia Group brands power global travel for everyone, everywhere. We design cutting-edge tech to make travel smoother and more memorable, and we create groundbreaking solutions for our partners. Our diverse, vibrant, and welcoming community is essential in driving our success.

Why Join Us?

To shape the future of travel, people must come first. Guided by our Values and Leadership Agreements, we foster an open culture where everyone belongs, differences are celebrated and know that when one of us wins, we all win.

We provide a full benefits package, including exciting travel perks, generous time-off, parental leave, a flexible work model (with some pretty cool offices), and career development resources, all to fuel our employees' passion for travel and ensure a rewarding career journey. We’re building a more open world. Join us.

Introduction to the team

The EG Advertising Platform Machine Learning Engineering team builds and operates the ML systems behind TravelAds, Expedia Group’s performance advertising marketplace generating over $1.3B in annual revenue. Our ML Orchestrator processes ~128 million requests per day at 99.9% availability with 25–45ms latency, ranking and scoring ads across multiple traveler experiences. We are transforming how ML models move from idea to production by automating the end-to-end lifecycle — from training and validation to deployment and monitoring — and by building agentic AI workflows that accelerate experimentation and unlock new advertising capabilities.

If you are excited about designing ML systems that automate the entire ML lifecycle while shipping LLM-powered solutions for ad relevance, golden dataset generation, and live inference at scale, this role is for you. This is a team where you won’t just deploy models — you’ll reshape how an advertising ML platform operates at scale.

In this role, you will:

Design and own high-throughput, low-latency ML systems (2000+ RPS) for TravelAds, including multi-service training and serving architectures, auction and ranking models, and real-time inference services that meet strict sub-100ms SLAs.
Build and evolve ML infrastructure and data foundations – feature stores, online/offline feature pipelines, embedding and vector services, and data lineage and versioning – that power ad relevance, bidding optimization, experimentation, and model evaluation at scale.
**Accelerate the end-to-end ML lifecycle **by automating training, validation, deployment, shadow testing, A/B testing, and retraining using orchestrated workflows (e.g., Flyte, Airflow) and robust quality gates.
Develop agentic AI and LLM/RAG-powered workflows that automate ML operations (training, deployment, validation, monitoring, calibration) and enable AI-assisted dataset creation, operational analysis, and decision support.
Define and implement ML observability, reliability, and cost guardrails through drift and feature-freshness monitoring, health dashboards, SLO/SLI definitions, incident response, and resilience-focused improvements.
Safely integrates and operates AI/ML-enabled solutions that improve outcomes, while setting technical direction, mentoring MLEs to operate independently, and leading cross-team initiatives that elevate ML engineering practices and business impact.

Minimum Qualifications:

Bachelor’s degree in Computer Science or a related technical field; or Equivalent related professional experience.
8+ years of relevant professional experience.
Proven track record of designing, building, and operating production ML or large-scale distributed systems, including system design (HLD/LLD), serving stacks, monitoring and observability, rollbacks, and operational rigor.
Strong software engineering foundation in Python and at least one of Java/Kotlin/Scala, with deep understanding of distributed systems, data structures, and performance optimization.
Experience leading technical design for multi-quarter ML projects and partnering with Product and business stakeholders to define problems, make clear trade-offs, and measure the business impact of ML systems.

Preferred Qualifications:

Experience with real-time ML inference at high throughput (1000+ RPS or more) and strict latency SLAs.
Expertise with big data technologies such as Spark, Hive, Databricks and workflow orchestration tools such as Airflow and Flyte, as well as cloud-native ML platforms and infrastructure (e.g., AWS SageMaker, EKS, EMR, Docker).
Experience building ML lifecycle automation – CI/CD for ML, automated training pipelines, deployment orchestration, and robust data lineage and versioning – plus ML observability systems including drift detection, feature-freshness monitoring, model health dashboards, and offline/online parity validation.
Track record of leading incident response and root cause analysis for ML or other mission-critical services, and driving sustained improvements in reliability, resilience, and operational excellence.
Familiarity with AI-driven systems, tools, or workflows and applying AI/ML concepts to improve real-world products and engineering outcomes, including experience with LLM productionization, RAG architectures, or agentic AI workflows in high-scale environments.

Accommodation requests

If you need assistance with any part of the application or recruiting process due to a disability, or other physical or mental health conditions, please reach out to our Recruiting Accommodations Team through the Accommodation Request.

We are proud to be named as a Best Place to Work on Glassdoor in 2024 and be recognized for award-winning culture by organizations like Forbes, TIME, Disability:IN, and others.

Expedia Group's family of brands includes: Brand Expedia®, Hotels.com®, Expedia® Partner Solutions, Vrbo®, trivago®, Orbitz®, Travelocity®, Hotwire®, Wotif®, ebookers®, CheapTickets®, Expedia Group™ Media Solutions, Expedia Local Expert®, CarRentals.com™, and Expedia Cruises™. © 2024 Expedia, Inc. All rights reserved. Trademarks and logos are the property of their respective owners. CST: 2029030-50

Employment opportunities and job offers at Expedia Group will always come from Expedia Group’s Talent Acquisition and hiring teams. Never provide sensitive, personal information to someone unless you’re confident who the recipient is. Expedia Group does not extend job offers via email or any other messaging tools to individuals with whom we have not made prior contact. Our email domain is @expediagroup.com. The official website to find and apply for job openings at Expedia Group is careers.expediagroup.com/jobs.

Expedia is committed to creating an inclusive work environment with a diverse workforce. All qualified applicants will receive consideration for employment without regard to race, religion, gender, sexual orientation, national origin, disability or age.