What you'd actually do

Design scalable, reusable data processing and data quality frameworks using Python, PySpark, and dbt

Build and optimize batch and streaming data pipelines with strong performance, fault tolerance, and observability

Develop and operate workflow orchestration (e.g., Apache Airflow) to schedule, monitor, and manage data movement and transformations

Model and transform data for analytics using SQL and dbt to support business intelligence and reporting workloads

Write production-grade Python/PySpark code with disciplined testing, performance tuning, and maintainable object-oriented design

What the JD emphasized

8 years of recent, hands-on professional experience actively coding as a data engineer

Strong software engineering fundamentals (system design, data structures, object-oriented programming, testing strategies, and end-to-end development lifecycle)

Strong Python programming skills, including unit and integration testing

Hands-on experience building and operating cloud-based data platforms using major cloud services (e.g., AWS, Google Cloud, or Azure)

Experience with large-scale distributed data processing and performance tuning

Hands-on experience with modern data warehousing/lakehouse technologies (e.g., Redshift, BigQuery, Snowflake; and engines such as Spark, Flink, or Trino; and table formats such as Iceberg, Hudi, or similar)

Strong SQL skills and experience with SQL-based transformation tooling (e.g., dbt)

Experience designing and operating orchestration pipelines using Airflow or similar tools

Experience designing and building streaming pipelines using Kafka, Pub/Sub, or similar messaging systems

Shape how hundreds of thousands of UK investors use data to make confident, informed investment decisions. Join a team building modern, cloud-native data platforms that enable analytics, regulatory reporting, and data-driven products at scale. You’ll work with contemporary lakehouse and streaming patterns, strong engineering practices, and a culture that values ownership and continuous improvement. This role offers meaningful scope to influence platform standards and mentor others while growing your technical and leadership impact.

Job summary

As a Lead Data Engineer at JPMorgan Chase within Personal Investing, you will design, build, and operate a robust cloud-native data platform and pipelines that power analytics, regulatory reporting, and data-promoten applications. You will help us deliver reliable, scalable, observable, and secure data solutions by applying strong software engineering fundamentals and modern data engineering patterns. You’ll work closely with partners across product, analytics, and engineering to translate business needs into resilient technical designs. You’ll also contribute to engineering excellence through best practices, mentoring, and thoughtful technical direction.

Job responsibilities

Design scalable, reusable data processing and data quality frameworks using Python, PySpark, and dbt
Build and optimize batch and streaming data pipelines with strong performance, fault tolerance, and observability
Develop and operate workflow orchestration (e.g., Apache Airflow) to schedule, monitor, and manage data movement and transformations
Model and transform data for analytics using SQL and dbt to support business intelligence and reporting workloads
Write production-grade Python/PySpark code with disciplined testing, performance tuning, and maintainable object-oriented design
Implement infrastructure-as-code (e.g., Terraform) to provision and manage cloud-based data platform components
Containerize and deploy services using Docker and Kubernetes (and related tooling such as Helm)
Collaborate with analysts, data scientists, and application teams to turn requirements into technical designs and delivered solutions
Own critical data systems by improving reliability, scalability, security, and operational excellence
Mentor junior engineers and influence the team’s technical direction through standards, reviews, and knowledge sharing

Required qualifications, capabilities, and skills

Degree in Computer Science or a STEM-related field (or equivalent)
Demonstrated experience delivering in an agile, fast-paced engineering environment
8 years of recent, hands-on professional experience actively coding as a data engineer
Strong software engineering fundamentals (system design, data structures, object-oriented programming, testing strategies, and end-to-end development lifecycle)
Strong Python programming skills, including unit and integration testing
Hands-on experience building and operating cloud-based data platforms using major cloud services (e.g., AWS, Google Cloud, or Azure)
Experience with large-scale distributed data processing and performance tuning
Hands-on experience with modern data warehousing/lakehouse technologies (e.g., Redshift, BigQuery, Snowflake; and engines such as Spark, Flink, or Trino; and table formats such as Iceberg, Hudi, or similar)
Strong SQL skills and experience with SQL-based transformation tooling (e.g., dbt)
Experience designing and operating orchestration pipelines using Airflow or similar tools
Experience designing and building streaming pipelines using Kafka, Pub/Sub, or similar messaging systems

Preferred qualifications, capabilities, and skills

Data modeling experience for analytics and reporting use cases
Knowledge of security, risk, compliance, and governance considerations for data platforms
Experience building continuous integration and continuous delivery automation for data and platform services
Experience with container-based deployment environments (Docker, Kubernetes, etc.)
Demonstrated ability to coach teammates on engineering practices and contribute to a collaborative, inclusive team culture

Job summary

Job responsibilities

Design scalable, reusable data processing and data quality frameworks using Python, PySpark, and dbt
Build and optimize batch and streaming data pipelines with strong performance, fault tolerance, and observability
Develop and operate workflow orchestration (e.g., Apache Airflow) to schedule, monitor, and manage data movement and transformations
Model and transform data for analytics using SQL and dbt to support business intelligence and reporting workloads
Write production-grade Python/PySpark code with disciplined testing, performance tuning, and maintainable object-oriented design
Implement infrastructure-as-code (e.g., Terraform) to provision and manage cloud-based data platform components
Containerize and deploy services using Docker and Kubernetes (and related tooling such as Helm)
Collaborate with analysts, data scientists, and application teams to turn requirements into technical designs and delivered solutions
Own critical data systems by improving reliability, scalability, security, and operational excellence
Mentor junior engineers and influence the team’s technical direction through standards, reviews, and knowledge sharing

Required qualifications, capabilities, and skills

Degree in Computer Science or a STEM-related field (or equivalent)
Demonstrated experience delivering in an agile, fast-paced engineering environment
8 years of recent, hands-on professional experience actively coding as a data engineer
Strong software engineering fundamentals (system design, data structures, object-oriented programming, testing strategies, and end-to-end development lifecycle)
Strong Python programming skills, including unit and integration testing
Hands-on experience building and operating cloud-based data platforms using major cloud services (e.g., AWS, Google Cloud, or Azure)
Experience with large-scale distributed data processing and performance tuning
Hands-on experience with modern data warehousing/lakehouse technologies (e.g., Redshift, BigQuery, Snowflake; and engines such as Spark, Flink, or Trino; and table formats such as Iceberg, Hudi, or similar)
Strong SQL skills and experience with SQL-based transformation tooling (e.g., dbt)
Experience designing and operating orchestration pipelines using Airflow or similar tools
Experience designing and building streaming pipelines using Kafka, Pub/Sub, or similar messaging systems

Preferred qualifications, capabilities, and skills

Data modeling experience for analytics and reporting use cases
Knowledge of security, risk, compliance, and governance considerations for data platforms
Experience building continuous integration and continuous delivery automation for data and platform services
Experience with container-based deployment environments (Docker, Kubernetes, etc.)
Demonstrated ability to coach teammates on engineering practices and contribute to a collaborative, inclusive team culture

Lead Data Engineer

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized