Lead Data Engineer

JPMorgan Chase JPMorgan Chase · Banking · LONDON, United Kingdom · Corporate Sector

Lead Data Engineer at JPMorgan Chase within Personal Investing, responsible for designing, building, and operating a cloud-native data platform and pipelines for analytics, regulatory reporting, and data-driven applications. Focuses on robust, scalable, observable, and secure data solutions using modern data engineering patterns and strong software engineering fundamentals.

What you'd actually do

  1. Design scalable, reusable data processing and data quality frameworks using Python, PySpark, and dbt
  2. Build and optimize batch and streaming data pipelines with strong performance, fault tolerance, and observability
  3. Develop and operate workflow orchestration (e.g., Apache Airflow) to schedule, monitor, and manage data movement and transformations
  4. Model and transform data for analytics using SQL and dbt to support business intelligence and reporting workloads
  5. Write production-grade Python/PySpark code with disciplined testing, performance tuning, and maintainable object-oriented design

Skills

Required

  • Python
  • PySpark
  • dbt
  • SQL
  • Apache Airflow
  • AWS
  • Google Cloud
  • Azure
  • Spark
  • Kafka
  • Pub/Sub
  • Terraform
  • Docker
  • Kubernetes
  • Helm
  • Computer Science degree or equivalent

Nice to have

  • Data modeling
  • Security
  • Risk
  • Compliance
  • Governance
  • CI/CD
  • Flink
  • Trino
  • Iceberg
  • Hudi
  • Redshift
  • BigQuery
  • Snowflake

What the JD emphasized

  • 8 years of recent, hands-on professional experience actively coding as a data engineer
  • Strong software engineering fundamentals (system design, data structures, object-oriented programming, testing strategies, and end-to-end development lifecycle)
  • Strong Python programming skills, including unit and integration testing
  • Hands-on experience building and operating cloud-based data platforms using major cloud services (e.g., AWS, Google Cloud, or Azure)
  • Experience with large-scale distributed data processing and performance tuning
  • Hands-on experience with modern data warehousing/lakehouse technologies (e.g., Redshift, BigQuery, Snowflake; and engines such as Spark, Flink, or Trino; and table formats such as Iceberg, Hudi, or similar)
  • Strong SQL skills and experience with SQL-based transformation tooling (e.g., dbt)
  • Experience designing and operating orchestration pipelines using Airflow or similar tools
  • Experience designing and building streaming pipelines using Kafka, Pub/Sub, or similar messaging systems