Senior Data Engineer I - Genai Foundation Models

Booking Booking · Hospitality · Tel Aviv, Israel · Data Engineering

Senior Data Engineer role focused on building and optimizing data pipelines for GenAI foundation models and supervised fine-tuning. The role involves processing massive textual and image data, ensuring data quality, and providing tools for ML scientists and the analytics community. It requires experience with cloud data pipelines, big data frameworks, and database systems, with a focus on enabling ML model training and product improvement.

What you'd actually do

  1. Rapidly developing next-generation scalable, flexible, and high-performance data pipelines.
  2. Dealing with massive textual sources to train GenAI foundation models.
  3. Solving issues with data and data pipelines, prioritizing based on customer impact.
  4. End-to-end ownership of data quality in our core datasets and data pipelines.
  5. Experimenting with new tools and technologies to meet business requirements regarding performance, scaling, and data quality.

Skills

Required

  • Python
  • Java
  • Pyspark
  • Apache Flink
  • Snowflake
  • MySQL
  • Cassandra
  • DynamoDB
  • Data Warehousing
  • ETL/ELT pipelines
  • schema design
  • data modeling
  • cloud data pipelines
  • data-lake
  • server-less solutions

Nice to have

  • data processing for large-scale language models like GPT, BERT, or similar architectures
  • NumPy
  • pandas
  • matplotlib
  • experimental design
  • A/B testing
  • evaluation metrics for ML models
  • products that impact a large customer base

What the JD emphasized

  • Minimum of 6 years of experience as a Data Engineer or a similar role, with a consistent record of successfully delivering ML/Data solutions.
  • You have built production data pipelines in the cloud, setting up data-lake and server-less solutions; ‌ you have hands-on experience with schema design and data modeling and working with ML scientists and ML engineers to provide production level ML solutions.
  • Experience with big data processing frameworks such, Pyspark, Apache Flink, Snowflake or similar frameworks.
  • Experience in data processing for large-scale language models like GPT, BERT, or similar architectures - an advantage.

Other signals

  • data pipelines
  • GenAI foundation models
  • ML scientists
  • data quality
  • petabytes of data