Senior Machine Learning Engineer

Expedia Expedia · Hospitality · Madrid, Spain

Senior Machine Learning Engineer at Expedia Group responsible for architecting, building, and operating ML infrastructure for training, deployment, and serving. Focuses on data pipelines, optimization, and ML Ops practices for ranking, recommendation, and pricing systems. Requires strong software engineering, distributed systems, and ML Ops expertise.

What you'd actually do

  1. Architect and own end-to-end ML infrastructure for training, deployment, and serving across batch and real-time environments.
  2. Define and drive technical standards for ML systems, including reliability, observability, and performance benchmarks.
  3. Lead the design and implementation of scalable data pipelines for large-scale feature engineering and model training.
  4. Own system performance and reliability across the ML platform, proactively identifying and resolving bottlenecks.
  5. Establish and evolve ML Ops practices, including CI/CD pipelines, monitoring, alerting, and A/B testing frameworks.

Skills

Required

  • Python
  • TensorFlow or PyTorch
  • ML algorithms
  • model architectures
  • ML infrastructure
  • AWS or GCP
  • Docker
  • Kubernetes
  • Spark
  • Kafka
  • ML model serving technologies
  • CI/CD tooling
  • ML platform design
  • mentoring engineers
  • technical discussions
  • cross-functional alignment

Nice to have

  • Software engineering
  • Data engineering
  • Distributed systems
  • Applied machine learning

What the JD emphasized

  • track record of delivering production ML systems
  • deep, proven experience in at least two of the following: software engineering, data engineering, distributed systems, or applied machine learning
  • deep proficiency in Python
  • hands-on experience with ML frameworks like TensorFlow or PyTorch
  • deep understanding of ML algorithms, model architectures, and the infrastructure required to build scalable, reliable ML systems
  • proficient with cloud platforms (e.g., AWS, GCP), containerization (Docker, Kubernetes), and distributed data systems (e.g., Spark, Kafka)
  • proven experience with ML model serving technologies (e.g., MLflow, TensorFlow Serving), CI/CD tooling, and ML platform design

Other signals

  • ML infrastructure
  • ML Ops
  • production ML systems
  • ranking problems
  • recommendation engines
  • pricing optimization