What you'd actually do

Designs, build, and maintain batch and (as needed) streaming data pipelines using Databricks.

Develops and optimize ETL/ELT workflows using PySpark / Spark SQL and Databricks workflows/jobs.

Implements data modeling (bronze/silver/gold patterns), curation, and dataset publishing for analytics and consumption.

Tunes and optimize Spark jobs for performance, cost, and scalability (partitioning, file sizing, caching, joins, etc.).

Ensures strong data quality through validations, reconciliations, monitoring, and alerting.

Skills

Required

Data Engineering
Databricks
Python
SQL
PySpark/Spark SQL
Data modeling
ETL/ELT
performance tuning
data quality
monitoring
troubleshooting
data pipeline architecture
orchestration concepts
dependency management
data lakes/lakehouse
Git-based workflows

Nice to have

AI/ML exposure
MLflow
Databricks model registry
Delta Lake
streaming
event-driven patterns
cloud platforms
data governance
orchestration tools
production-grade data platforms

We have an exciting and rewarding opportunity for you to take your software engineering career to the next level.

As a Software Engineer III at JPMorgan Chase within the Asset & Wealth Management, you serve as a seasoned member of an agile team to design and deliver trusted market-leading technology products in a secure, stable, and scalable way. You are responsible for carrying out critical technology solutions across multiple technical areas within various business functions in support of the firm’s business objectives.

Job responsibilities

Designs, build, and maintain batch and (as needed) streaming data pipelines using Databricks.
Develops and optimize ETL/ELT workflows using PySpark / Spark SQL and Databricks workflows/jobs.
Implements data modeling (bronze/silver/gold patterns), curation, and dataset publishing for analytics and consumption.
Tunes and optimize Spark jobs for performance, cost, and scalability (partitioning, file sizing, caching, joins, etc.).
Ensures strong data quality through validations, reconciliations, monitoring, and alerting.
Works with stakeholders (data analysts, data scientists, product, and engineering teams) to translate requirements into data solutions.
Implements and follow CI/CD and SDLC practices for data engineering code (testing, code reviews, version control).
Supports production operations: incident triage, root-cause analysis, and pipeline reliability improvements.
Contributes to documentation, standards, and reusable frameworks to improve team productivity.

Required qualifications, capabilities, and skills

Formal training or certification on software engineering concepts and 3+ years applied experience
Hands-on experience in Data Engineering.
Strong experience with Databricks (jobs/workflows, notebooks, clusters, performance tuning).
Proficiency in Python and SQL; strong hands-on in PySpark/Spark SQL.
Experience in Data modeling, ETL/ELT, performance tuning, data quality, monitoring, troubleshooting.
Solid understanding of data pipeline architecture, orchestration concepts, and dependency management.
Experience working with data lakes/lakehouse storage patterns and file formats (e.g., Parquet).
Familiarity with Git-based workflows and engineering best practices.

Preferred qualifications, capabilities, and skills

AI/ML exposure as an added advantage: experience supporting ML workflows by building feature datasets, training/serving data pipelines, or enabling model monitoring and experimentation (e.g., working with data scientists on reproducible data inputs, feature engineering, and ML-ready tables).
Familiarity with ML ecosystem/tools is a plus (examples: MLflow, Databricks model registry, notebooks-based experimentation), and understanding of basic ML concepts (training vs inference, leakage, drift).

Experience with Delta Lake features (ACID tables, time travel, optimization).

Exposure to streaming (e.g., Spark Structured Streaming) and event-driven patterns.

Experience with cloud platforms (AWS/Azure/GCP) and cloud storage integrations.

Knowledge of data governance, access controls, and secure handling of sensitive data.

Familiarity with orchestration tools (e.g., Airflow or similar) and supporting production-grade data platforms (monitoring, SLAs, on-call rotations).