Senior Machine Learning Engineer - Distribution & Supply

Expedia Expedia · Hospitality · Seattle, WA

Senior Machine Learning Engineer role focused on building and optimizing machine learning-driven systems for Expedia's Distribution & Supply team. The role involves end-to-end delivery of ML features, from problem framing and data sourcing to deployment, monitoring, and operational support, with a focus on improving the quality and performance of the distribution platform. Responsibilities include system design, API design, data modeling, and mentoring other engineers.

What you'd actually do

  1. Design, build, and evolve robust, scalable machine learning systems and services, including system design (LLD), API design, and data modeling to power complex product capabilities across multiple domains.
  2. Own end‑to‑end delivery of machine learning features and platforms, from problem framing, data sourcing, feature engineering, and model development and evaluation through implementation, testing, deployment, monitoring, and ongoing operational support.
  3. Collaborate with product, data, and engineering teams to translate ambiguous business and customer problems into clear ML‑driven solutions, selecting appropriate modeling approaches and integrating them into production services and applications.
  4. Improve model and system quality, reliability, and performance by driving best practices in experimentation, validation, observability, security, and operational excellence for the ML services you own.
  5. Mentor and support other engineers and data practitioners through technical design discussions, review of modeling and code work, and knowledge sharing, helping to elevate ML engineering practices across teams and domains.

Skills

Required

  • Python or Java
  • core software engineering concepts
  • system design (LLD)
  • API design
  • data modeling
  • ML fundamentals including model training, evaluation, and deployment
  • service‑oriented or microservice architectures
  • building and consuming APIs
  • large‑scale data pipelines
  • reliability, scalability, and security of ML‑backed services
  • operating ML workflows in production environments
  • monitoring model and data health
  • responding to incidents
  • improving systems based on experimental results and operational feedback

Nice to have

  • architecting and evolving complex, distributed ML platforms or systems
  • high‑volume, low‑latency prediction workloads
  • large‑scale batch inference
  • clear, well‑versioned API contracts
  • resilient data models
  • lead technical design for ML‑driven features or services
  • make sound tradeoffs between modeling complexity, performance, and operational cost
  • align solutions with broader domain or organizational standards
  • driving operational excellence for ML systems
  • improving observability of models and data
  • reducing manual toil through automation (for example, CI/CD for models, feature stores, or model registry workflows)
  • enhancing performance, resilience, or cost efficiency
  • AI‑driven systems, tools, or workflows
  • applying AI/ML concepts to real world products
  • designing and running experiments
  • using metrics and analytics to guide model iteration
  • managing model lifecycle (retraining, versioning, and rollout strategies)
  • advanced AI/ML tooling and infrastructure
  • distributed training frameworks
  • modern ML platforms
  • inference optimization techniques
  • delivering robust, scalable, and trustworthy ML solutions

What the JD emphasized

  • advanced machine learning engineering
  • robust, scalable machine learning systems
  • end‑to‑end delivery of machine learning features and platforms
  • integrate them into production services and applications
  • operational excellence for the ML services
  • operating ML workflows in production environments
  • architecting and evolving complex, distributed ML platforms or systems
  • high‑volume, low‑latency prediction workloads
  • operational cost
  • operational excellence for ML systems
  • modern ML platforms
  • inference optimization techniques

Other signals

  • machine learning systems
  • deploy and scale robust models
  • integrate them into production services and applications
  • operational excellence for the ML services