Lead Software Engineer - Data Engineer at JPMorgan Chase

What you'd actually do

Lead, mentor, and grow a high-performing team of 5 – 7 engineers across multiple workstreams, fostering a culture of innovation, ownership, and technical excellence.

Operate as a player-coach — providing hands-on architectural guidance while empowering the team to own and deliver independently.

Architect and own the end-to-end technical design of the Data Products Studio — a scalable, enterprise-grade platform that orchestrates the discovery, design, build, and productionization of data products from the CCB Data Lake and Snowflake.

Design the platform's AI/Agentic AI layer, leveraging intent agents, NLP Text-to-SQL, Knowledge Graphs (KAG), RAG, Vector Databases, and Agent-to-Agent (A2A) communication to enable intelligent, automated data product creation and natural language interaction with the data estate.

Lead the design and development of Agentic AI capabilities that power the Data Products Framework — including autonomous discovery agents that profile and recommend data product candidates, design agents that auto-generate data contracts and schema recommendations, build agents that generate and optimize data pipelines, governance agents that auto-apply entitlements based on data classification, and quality agents that detect anomalies, drift, and trigger self-healing remediation.

Skills

Required

Python
SQL
Java 17+
Spring
Boot
system design
distributed systems
ETL/ELT pipelines
batch and real-time data processing
PySpark
DataFrame API
Dataset API
Spark SQL
React
Angular
AWS cloud services
S3
Athena
Glue
Lambda
Step Functions
IAM
KMS
Terraform
Snowflake
LLMs
RAG architectures
Vector Databases
NLP
agentic frameworks
data governance
metadata management
data lineage
access control
data classification
policy enforcement
Grafana

Nice to have

Knowledge Graphs (KAG)
Agent-to-Agent (A2A) communication

We have an opportunity to impact your career and provide an adventure where you can push the limits of what's possible.

As a Lead Software Engineer at JPMorganChase within the Marketing Automation Platforms Team, you are an integral part of an agile team that works to enhance, build, and deliver trusted market-leading technology products in a secure, stable, and scalable way. As a core technical contributor, you are responsible for conducting critical technology solutions across multiple technical areas within various business functions in support of the firm’s business objectives.

Job responsibilities

Lead, mentor, and grow a high-performing team of 5 – 7 engineers across multiple workstreams, fostering a culture of innovation, ownership, and technical excellence.
Set the technical vision and engineering roadmap for the Data Products platform, aligning with firmwide priorities.
Operate as a player-coach — providing hands-on architectural guidance while empowering the team to own and deliver independently.
Drive cross-functional collaboration with platform teams, domain Data Product Owners, AI/ML teams and governance teams.
Architect and own the end-to-end technical design of the Data Products Studio — a scalable, enterprise-grade platform that orchestrates the discovery, design, build, and productionization of data products from the CCB Data Lake and Snowflake.
Design the platform's AI/Agentic AI layer, leveraging intent agents, NLP Text-to-SQL, Knowledge Graphs (KAG), RAG, Vector Databases, and Agent-to-Agent (A2A) communication to enable intelligent, automated data product creation and natural language interaction with the data estate.
Establish and enforce architectural standards, design patterns, and engineering best practices across the team — ensuring scalability, security, resilience, and maintainability.
Lead the design and development of Agentic AI capabilities that power the Data Products Framework — including autonomous discovery agents that profile and recommend data product candidates, design agents that auto-generate data contracts and schema recommendations, build agents that generate and optimize data pipelines, governance agents that auto-apply entitlements based on data classification, and quality agents that detect anomalies, drift, and trigger self-healing remediation.
Architect the Agent-to-Agent communication layer enabling multi-agent orchestration across the data product lifecycle — from discovery through productionization.
Leverage RAG (Retrieval Augmented Generation) and Vector Databases to enable contextual, knowledge-grounded AI interactions with metadata, lineage, and data catalog information.
Implement NLP Text-to-SQL capabilities allowing business users to explore the CCB Data Lake and Snowflake using natural language, lowering the barrier to data product discovery.

Required qualifications, capabilities, and skills

- Proven track record of architecting and delivering large-scale, enterprise-grade data platforms or frameworks from concept through production in a large corporate environment.
- Deep hands-on expertise in Python, SQL, and at least one additional language (Java 17+, Spring, Boot), with strong system design and distributed systems knowledge.
- Extensive experience designing, building, and optimizing ETL/ELT pipelines at scale, including batch and real-time data processing.
- Strong proficiency in PySpark for distributed data processing, including DataFrame and Dataset APIs and Spark SQL.
- Experience working with UI frameworks (React, Angular).
- Extensive experience with AWS cloud services including S3, Athena, Glue, Lambda, Step Functions, IAM, KMS, and Terraform.
- Basic knowledge of Snowflake (architecture, performance optimization, Tasks, Streams, Stored Procedures, Materialized Views, security model)
- Experience designing and building AI/ML-powered platforms or applications, with working knowledge of LLMs, RAG architectures, Vector Databases, NLP, and agentic frameworks.
- Deep understanding of data governance principles including metadata management, data lineage, access control (RBAC/ABAC), data classification, and policy enforcement.
- Experience with Grafana or equivalent observability platforms for custom dashboards, APM, SLA monitoring and alerting