Data Engineer II - Content Intelligence

Booking Booking · Hospitality · Amsterdam, Netherlands · Data Engineering

Data Engineer II role focused on building and optimizing data pipelines to support Generative AI foundation models and supervised fine-tuning. The role involves working with large textual and image datasets, ensuring data quality for ML models, and collaborating with data scientists and engineers to deliver production-level ML solutions.

What you'd actually do

  1. Rapidly developing next-generation scalable, flexible, and high-performance data pipelines.
  2. Dealing with massive textual sources to train GenAI foundation models.
  3. Solving issues with data and data pipelines, prioritizing based on customer impact.
  4. End-to-end ownership of data quality in our core datasets and data pipelines.
  5. Experimenting with new tools and technologies to meet business requirements regarding performance, scaling, and data quality.

Skills

Required

  • production data pipelines in the cloud
  • schema design
  • data modeling
  • Python
  • Java
  • Pyspark
  • Apache Flink
  • Snowflake
  • MySQL
  • Cassandra
  • DynamoDB
  • Data Warehousing
  • ETL/ELT pipelines

Nice to have

  • data processing for large-scale language models like GPT, BERT, or similar architectures
  • NumPy
  • pandas
  • matplotlib
  • experimental design
  • A/B testing
  • evaluation metrics for ML models
  • products that impact a large customer base

What the JD emphasized

  • building ML models
  • production level ML solutions
  • train GenAI foundation models

Other signals

  • Generative AI innovation
  • train GenAI foundation models
  • building ML models