Machine Learning Engineer II

Expedia Expedia · Hospitality · Bangalore, India

Machine Learning Engineer II role focused on building and operating ML systems for Expedia Group's advertising platform, which generates over $1.3B in annual revenue. The role involves designing, developing, testing, and maintaining scalable ML services, optimizing and productionizing ML models, and owning the end-to-end lifecycle of ML features. Emphasis on automation, orchestration, AI-assisted workflows, and improving operational quality and ML lifecycle efficiency at scale.

What you'd actually do

  1. Design, develop, test, and maintain scalable, resilient, and secure machine learning services and components that power Expedia Group products and platforms.
  2. Collaborate with product managers, data scientists, architects, and other engineers to translate business and customer requirements into robust ML system designs, including low-level design, API design, and data models for training and serving.
  3. Implement, optimize, and productionize ML models, writing clean, maintainable, and well‑documented code, automated tests, and tooling that improve reliability, observability, and operational excellence for ML pipelines and services.
  4. Participate in code reviews, design reviews, and technical discussions, identifying opportunities to simplify ML systems, reduce technical debt, and improve performance, quality, and cost efficiency across multiple services or domains.
  5. Own the end‑to‑end lifecycle of ML features and services, including data preparation, training, deployment, monitoring, incident response, and incremental improvement, and safely integrate and operate AI/ML‑enabled solutions that improve outcomes.

Skills

Required

  • Bachelor’s degree in Computer Science or a related technical field; or Equivalent related professional experience.
  • 2+ years of relevant professional experience.
  • Hands‑on proficiency in at least one modern programming language and its ecosystem, with experience in system design (LLD), API design, and data modeling for ML‑driven, service‑oriented, or microservices architectures.
  • Experience owning ML features or services through development, experimentation, testing, deployment, and operational support, including monitoring, troubleshooting, and resolving production issues.
  • Solid understanding of core computer science and ML engineering fundamentals such as data structures, algorithms, distributed systems concepts, model lifecycle management, and secure coding and data handling practices.

Nice to have

  • Experience designing and implementing scalable, fault‑tolerant, and high‑throughput ML services, including well‑structured online/offline data models and APIs that serve multiple teams or domains.
  • Demonstrated track record of improving ML service availability, performance, and reliability using metrics, observability, automation, and strong operational practices such as canary releases and automated rollback.
  • Background integrating or leveraging AI/ML‑enabled platforms, feature stores, and training/serving infrastructure within production systems, and safely operating AI‑driven features to enhance customer and business outcomes.
  • Experience working across multiple technical domains or layers of the stack (for example, data pipelines, model training, and model serving APIs), adapting quickly to new ML frameworks, platforms, and AI‑driven tooling.
  • Ability to contribute to and influence ML system designs within a team or product area, making data‑driven decisions using experimentation and evaluation metrics, and helping evolve engineering and ML best practices for responsible and safe AI.

What the JD emphasized

  • scalable, resilient, and secure machine learning services
  • low-level design, API design, and data models for training and serving
  • optimize, and productionize ML models
  • reliability, observability, and operational excellence for ML pipelines and services
  • end‑to‑end lifecycle of ML features and services
  • safely integrate and operate AI/ML‑enabled solutions

Other signals

  • ML systems powering advertising marketplace
  • Process 128 million daily requests at 99.9% availability with 25-45ms latency
  • Investing in automation, orchestration, and AI-assisted workflows
  • Improve how ML models move from idea to production
  • Reduce cycle time across the ML lifecycle
  • Build reliable ML systems
  • Improve operational quality
  • Contribute to production ML at meaningful scale