Lead Software Engineer - Databricks, Spark, Aws

JPMorgan Chase JPMorgan Chase · Banking · Plano, TX +1 · Corporate Sector

Lead Software Engineer at JPMorgan Chase focused on architecting and delivering high-throughput, low-latency data pipelines using Databricks, Apache Spark, and AWS. The role involves establishing lakehouse patterns, optimizing Databricks clusters, and driving team adoption of AI-assisted engineering practices while ensuring security and compliance. Responsibilities include job orchestration, data ingestion/transformation framework design, data quality enforcement, and Spark performance tuning.

What you'd actually do

  1. Lead architecture and delivery of high-throughput, low-latency data pipelines using Databricks and Apache Spark (Core, SQL, Structured Streaming).
  2. Establish lakehouse patterns with Delta Lake (ACID transactions, schema evolution, time travel, Z-ordering, compaction) and ensure performance at scale.
  3. Own Databricks cluster strategy and setup: runtime selection, autoscaling, driver/executor sizing, Spark configs, init scripts, cluster policies, pools, and instance profiles.
  4. Drives team adoption of enterprise-authorized AI-assisted engineering practices within the work environment to improve code quality, delivery speed, and operational outcomes (e.g., AI-assisted code review/refactoring, test strategy acceleration, incident/root-cause analysis support), while establishing consistent validation standards (secure coding, peer review, automated testing) and promoting reuse of effective patterns across the team.
  5. Applies knowledge of tools within the Software Development Life Cycle toolchain, including enterprise-authorized AI-assisted development and automation capabilities, to improve the value realized by automation.

Skills

Required

  • Formal training or certification on software engineering concepts and 5+ years applied experience.
  • 10+ years of professional software/data engineering experience, including substantial production work with Spark on Databricks or EMR.
  • Strong proficiency in Python and/or Java for data processing, platform tooling, and automation.
  • Hands-on Databricks expertise (Delta Lake, Unity Catalog, Workflows, Repos/notebooks, SQL Warehouses).
  • Solid AWS experience: S3, IAM, Glue, CloudWatch, Kinesis / MSK, DynamoDB
  • Demonstrated experience leading effective use of approved AI-assisted software development tools (e.g., for coding, code review, test acceleration, troubleshooting) with the ability to set team expectations for validating AI outputs for correctness, performance, and security.
  • Strong understanding of responsible AI use in engineering workflows, including data sensitivity considerations, secure handling of inputs/outputs, and adherence to resiliency and security expectations; experience coaching engineers on safe, compliant adoption within delivery practices
  • Proven track record architecting and operating ETL/ELT pipelines (batch and streaming), with schema design/evolution, SLAs, and reliability engineering.
  • Deep skills in Spark performance tuning and Databricks cluster setup/optimization.
  • Strong SQL and analytics data modeling (dimensional/star schema; lakehouse best practices).
  • Security-first mindset: roles/instance profiles, secret management, encryption-at-rest/in-transit, and network controls.

Nice to have

  • Experience with Delta Live Tables and advanced governance (catalogs, grants, auditing) in Databricks.
  • AWS networking knowledge (VPC, subnets, routing, security groups) and data egress controls.
  • Experience with Terraform for Infra deployments
  • Cost optimization experience: autoscaling strategies, spot vs on-demand, auto-termination, storage layouts and compaction.
  • Familiarity with Kafka/MSK or Kinesis Data Streams/Firehose for real-time ingestion.
  • CI/CD and automation tooling for data (Git workflows, artifact management) and testing frameworks (pytest, JUnit).
  • Observability for data systems (freshness/completeness metrics, lineage, SLAs, alerting).
  • Experience in financial services or other regulated industries.

What the JD emphasized

  • Drives team adoption of enterprise-authorized AI-assisted engineering practices within the work environment to improve code quality, delivery speed, and operational outcomes (e.g., AI-assisted code review/refactoring, test strategy acceleration, incident/root-cause analysis support), while establishing consistent validation standards (secure coding, peer review, automated testing) and promoting reuse of effective patterns across the team.
  • Applies knowledge of tools within the Software Development Life Cycle toolchain, including enterprise-authorized AI-assisted development and automation capabilities, to improve the value realized by automation.
  • Demonstrated experience leading effective use of approved AI-assisted software development tools (e.g., for coding, code review, test acceleration, troubleshooting) with the ability to set team expectations for validating AI outputs for correctness, performance, and security.
  • Strong understanding of responsible AI use in engineering workflows, including data sensitivity considerations, secure handling of inputs/outputs, and adherence to resiliency and security expectations; experience coaching engineers on safe, compliant adoption within delivery practices