What you'd actually do

Build and maintain scalable, reusable data processing and data quality frameworks using Python, PySpark, and dbt

Build and operate batch and streaming data pipelines with strong scalability, performance, and fault tolerance

Develop and manage workflow orchestration using tools such as Apache Airflow to support reliable, observable, and well-scheduled data movement and transformations

Implement and optimize data models and warehouse structures to support analytics and business intelligence workloads

Write clean, testable Python/PySpark code using object-oriented principles and unit testing

Skills

Required

Degree in Computer Science or a STEM-related field (or equivalent)
Experience working in an agile and dynamic environment
Experience across the software development lifecycle (requirements, design, architecture, development, testing, deployment, release, and support)
Hands-on experience with major cloud technologies (e.g., AWS, Google Cloud, or Azure)
Experience writing Python using object-oriented programming and unit/integration testing practices
Experience with SQL and familiarity with SQL-based workflow management tools such as dbt
Experience with orchestration tools such as Airflow (or similar)
Understanding of messaging/streaming systems such as Kafka or Pub/Sub (or similar)
Familiarity with infrastructure-as-code (e.g., Terraform) for cloud-based data infrastructure

Nice to have

Data modeling skills
Experience with data streaming and scalable processing frameworks (e.g., Spark, Flink, Beam, or similar)
Experience automating deployment, releases, and testing in continuous integration and continuous delivery pipelines
Experience with lakehouse patterns and table formats (e.g., Apache Iceberg)
Experience with federated query engines such as Trino
Experience designing automated tests (unit, component, integration, and end-to-end), including use of mocking frameworks
Experience with containers and container-based deployment environments (e.g., Docker, Kubernetes, or similar)

Build the data foundation behind a digital investing experience used by over 275,000 investors in the UK. Join Personal Investing to help deliver clear, data-driven insights through robust cloud-native platforms and pipelines. You’ll work with modern lakehouse, warehousing, and streaming technologies while strengthening engineering excellence and operational reliability. This is an opportunity to grow your impact on a platform that supports analytics and regulatory reporting at scale.

Job summary

As a Data Engineer at JPMorgan Chase within Personal Investing, you will build and operate a robust cloud-native data platform and pipelines that power analytics, regulatory reporting, and data-promoten applications at scale. You will help us deliver reliable, scalable, observable, and secure data solutions across cloud-native services, lakehouse architectures, data warehousing, and streaming systems. You’ll partner with teammates to build consistent, maintainable pipelines and contribute across the software delivery lifecycle from requirements through support.

Job responsibilities

Build and maintain scalable, reusable data processing and data quality frameworks using Python, PySpark, and dbt
Build and operate batch and streaming data pipelines with strong scalability, performance, and fault tolerance
Develop and manage workflow orchestration using tools such as Apache Airflow to support reliable, observable, and well-scheduled data movement and transformations
Implement and optimize data models and warehouse structures to support analytics and business intelligence workloads
Write clean, testable Python/PySpark code using object-oriented principles and unit testing
Implement infrastructure-as-code for the data platform using Terraform
Containerize and deploy services using Docker, Kubernetes, and Helm
Contribute across the software development lifecycle, including requirements, design, development, testing, deployment, release, and support
Collaborate with teammates in an agile, dynamic environment to deliver reliable outcomes

Required qualifications, capabilities, and skills

Degree in Computer Science or a STEM-related field (or equivalent)
Experience working in an agile and dynamic environment
Experience across the software development lifecycle (requirements, design, architecture, development, testing, deployment, release, and support)
At least 5 years of recent, hands-on professional experience actively coding as a data engineer
Hands-on experience with major cloud technologies (e.g., AWS, Google Cloud, or Azure)
Experience writing Python using object-oriented programming and unit/integration testing practices
Experience with SQL and familiarity with SQL-based workflow management tools such as dbt
Experience with orchestration tools such as Airflow (or similar)
Understanding of messaging/streaming systems such as Kafka or Pub/Sub (or similar)
Familiarity with infrastructure-as-code (e.g., Terraform) for cloud-based data infrastructure

Preferred qualifications, capabilities, and skills

Data modeling skills
Experience with data streaming and scalable processing frameworks (e.g., Spark, Flink, Beam, or similar)
Experience automating deployment, releases, and testing in continuous integration and continuous delivery pipelines
Experience with lakehouse patterns and table formats (e.g., Apache Iceberg)
Experience with federated query engines such as Trino
Experience designing automated tests (unit, component, integration, and end-to-end), including use of mocking frameworks
Experience with containers and container-based deployment environments (e.g., Docker, Kubernetes, or similar)