Data Engineer III

Expedia Expedia · Hospitality · IL

Data Engineer III role focused on building and supporting scalable data solutions, specifically designing and delivering LLM/AI-driven data solutions within Expedia's Ad Tech business. This involves leveraging vector databases and the Model Context Protocol (MCP) for retrieval-augmented insights, and building batch/streaming pipelines to generate embeddings and provide data to LLM applications. The role requires strong data engineering skills with technologies like Spark, Flink, and cloud services, and collaboration with product, business, and science teams.

What you'd actually do

  1. Design, build, and support scalable and durable data solutions that can enable self-service consumption use cases using cloud based technologies in an agile manner.
  2. Support Expedia Group’s product and business teams’ specific data needs on a global scale.
  3. Write clean, efficient and thoroughly tested code.
  4. Develop scalable and highly-performant distributed systems with everything this entails (availability, monitoring, resiliency).
  5. Design and deliver scalable LLM/AI-driven data solutions within ADAM (Advertising Data, Attribution, and Measurement), leveraging vector databases and the Model Context Protocol (MCP) to power retrieval-augmented insights for Expedia Group’s Ad Tech business.
  6. Build and operate robust batch and streaming pipelines that generate embeddings and provision privacy-safe, high-quality data to LLM applications.
  7. Partner with Ads science, product, and sales to surface actionable insights for campaign performance, attribution, audience quality, forecasting, and yield optimization.

Skills

Required

  • Apache Spark
  • Apache Flink
  • AWS Cloud Services
  • Scala
  • Java
  • Cassandra
  • MongoDB
  • SQL
  • NoSQL
  • GraphQL
  • DataDog
  • Splunk
  • Master’s degree in Computer Science, Data Science, Analytics or related field, and 3 years of experience in the job offered or in a Data Engineering-related occupation.

What the JD emphasized

  • LLM/AI-driven data solutions
  • vector databases
  • Model Context Protocol (MCP)
  • retrieval-augmented insights
  • batch and streaming pipelines that generate embeddings
  • provision privacy-safe, high-quality data to LLM applications

Other signals

  • LLM/AI-driven data solutions
  • vector databases
  • Model Context Protocol (MCP)
  • retrieval-augmented insights
  • batch and streaming pipelines that generate embeddings
  • provision privacy-safe, high-quality data to LLM applications