Data Engineer Iii, RAG and Gen AI

Expedia Expedia · Hospitality · Gurgaon, India, India

Data Engineer III role focused on building and operating data pipelines and embeddings workflows for Agentic AI applications, including RAG systems and vector databases, within a consumer travel tech company.

What you'd actually do

  1. Design and develop, scalable cloud-native solutions that are scalable, responsive & resilient.
  2. Build scalable ingestion pipelines for structured and unstructured data (documents, logs, knowledge bases, transactional data)
  3. Design semantic layers and context-building strategies for LLM consumption
  4. Architect and build production-ready RAG systems (retrieval pipelines, embeddings, vector indexing, ranking strategies) and work with vector databases and retrieval systems
  5. Develop embedding pipelines and manage vector databases at scale

Skills

Required

  • 6+ years of development experience in an enterprise-level engineering environment increasing levels of technical expertise.
  • 4+ years of hands-on backend Data Engineering application development experience with an excellent understanding of products with microservice architecture.
  • Proven hands-on experience designing, building, and operating data pipelines that enable LLM-based agentic AI systems, including support for embeddings, retrieval layers, and orchestration workflows.
  • Expert-level SQL and strong Python proficiency
  • Experience with distributed processing frameworks (Spark, Databricks, Flink, etc.)
  • Experience building data pipelines in cloud-native environments (AWS/GCP/Azure)
  • Experience building scalable, fault-tolerant, observable systems
  • Good knowledge of Data Structures and Algorithm.
  • Strong understanding of data modeling and semantic layer design
  • Understanding of embeddings, chunking strategies, retrieval optimization, and re-ranking

Nice to have

  • Java is a plus

What the JD emphasized

  • Proven hands-on experience designing, building, and operating data pipelines that enable LLM-based agentic AI systems, including support for embeddings, retrieval layers, and orchestration workflows.

Other signals

  • designing, building, deploying, and operating data pipelines
  • embeddings workflows to power Agentic AI applications
  • architecture decisions
  • drive AI platform evolution
  • enterprise-grade reliability, governance, and scalability