Data Engineer

Lyft Lyft · Consumer · Toronto, ON · Mapping

Data Engineer on the Mapping team responsible for architecting, building, and maintaining scalable data pipelines and services to support route simulation, experimentation, analytics, and machine learning models. The role involves working with AWS, Kubernetes, and Apache Airflow, and requires strong experience with Spark, Python, and various database technologies.

What you'd actually do

  1. Owner of the core data pipelines in mapping, responsible for scaling up data processing flow to meet the rapid data growth at Lyft
  2. Develop strong subject matter expertise in the systems you manage, setting and managing SLAs for both data pipeline and datasets
  3. Continuously evolve data models and schemas to meet business and engineering requirements
  4. Develop tools that support self-service management of data pipelines (ETL), schema evolution, and perform SQL tuning to optimize data processing performance
  5. Write clean, well-tested, and maintainable code, prioritizing scalability and cost efficiency

Skills

Required

  • Spark
  • Python
  • SQL
  • Data Pipelines
  • ETL
  • Data Modeling
  • Cloud (AWS)
  • Kubernetes
  • Airflow
  • Database technologies (S3, DynamoDB, HDFS, Hive, Presto, Pig, HBase, Parquet, Iceberg, Flink, Spark Streaming, Kafka)
  • Data quality tools (Great Expectations, dbt, Monte Carlo, Soda, Collibra)
  • Geospatial data querying
  • Performance tuning
  • Workflow management tools (Airflow, Oozie, Azkaban, UC4, Prefect)
  • Infrastructure tooling (Terraform, Cloud Formation, Docker, Kubernetes, Ansible, Chef, Puppet)
  • API schema definition
  • Backend services development

Nice to have

  • Ruby
  • Bash
  • MySQL
  • PostgreSQL
  • SqlServer
  • Oracle

What the JD emphasized

  • 4+ years of relevant professional experience
  • Strong experience with Spark
  • Experience with disparate database, querying and streaming technologies such as S3, DynamoDB, HDFS, Hive, Presto, Pig, HBase, Parquet, Iceberg, Flink, Spark Streaming, Kafka
  • Experience with data quality tools such as Great Expectations, dbt, Monte Carlo, Soda, Collibra
  • Strong understanding of SQL Engine, experience with querying geospatial data, and able to conduct advanced performance tuning