Machine Learning Engineer III

Expedia Expedia · Hospitality · Bangalore, India

Machine Learning Engineer III at Expedia Group, focused on improving existing marketing bidding systems using ML. Responsibilities include developing and refactoring ML components, designing big data and ML applications for training, evaluation, and serving, and operating ML models in production. The role involves monitoring, diagnosing issues like data/model drift, implementing guardrails, and optimizing for efficiency and observability. Requires 5+ years of experience in end-to-end ML pipelines, proficiency in Spark and ML libraries (PyTorch/TensorFlow), and MLOps practices.

What you'd actually do

  1. Collaborate with peers and stakeholders across the organization to understand cross-dependencies, shape solutions, and translate experimental DS workflows into robust production pipelines.
  2. Develop, refactor, and test complex ML and software components, applying solid software engineering practices (design principles, data structures, design patterns) to produce clean, maintainable, and optimized code.
  3. Contribute to the design of big data and ML applications, including how models are trained, evaluated, and served at scale across batch and streaming (online) inference workflows.
  4. Evaluate, monitor, and operate ML models in production, instrumenting pipelines to capture online and offline metrics (latency, throughput, accuracy/quality, drift, and business KPIs) and using them to drive iteration.
  5. Diagnose model and pipeline issues (e.g., performance regressions, instability), distinguish data drift vs. model drift, and drive mitigations such as retraining, recalibration, and feature or architecture changes.

Skills

Required

  • 5+ years of relevant professional experience with end-to-end machine learning engineering pipelines in production
  • Spark (or similar big data frameworks)
  • ML libraries such as PyTorch and/or TensorFlow
  • integrating models into production services for inference at scale
  • ML fundamentals
  • deep learning
  • big data concepts
  • refactoring and scaling ML models for production
  • evaluating and monitoring ML models in production
  • MLOps practices
  • secure data access and governance
  • designing moderately complex distributed systems centered on ML training and serving

What the JD emphasized

  • improve existing systems
  • high-scale production environment
  • big data and ML applications
  • Evaluate, monitor, and operate ML models in production
  • data drift vs. model drift
  • guardrails
  • MLOps practices

Other signals

  • improving existing systems
  • high-scale production environment
  • robust production pipelines
  • MLOps practices