Lead Software Engineer - Data Engineering

JPMorgan Chase JPMorgan Chase · Banking · Bengaluru, Karnataka, India · Commercial & Investment Bank

Lead Software Engineer - Data Engineering at JPMorgan Chase responsible for designing, building, and operating scalable data platforms and pipelines. The role focuses on creating high-quality curated datasets with clear contracts, lineage, and SLAs/SLOs, ensuring strong controls across security, privacy, resiliency, and auditability. The engineer will also partner with product and platform teams to enable AI/ML and agentic patterns, and build search/indexing pipelines supporting RAG-style experiences. The role requires hands-on expertise in Java and Python, data modeling, orchestration, transformation frameworks, and secure engineering fundamentals, with an emphasis on responsible AI use in engineering workflows.

What you'd actually do

  1. Design, build, and operate batch and streaming data pipelines that are reliable, observable, and cost-efficient, with clear runbooks and production support ownership.
  2. Lead data modeling and curation for domain datasets, including schema evolution, data contracts, lineage expectations, and consumer-facing SLAs/SLOs.
  3. Implement robust ETL/ELT workflows with strong validation controls, including reconciliation, completeness checks, anomaly detection, and automated alerting.
  4. Engineer high-throughput data processing solutions using a combination of Java and Python, selecting the right tool for performance, maintainability, and platform standards.
  5. Build and operate orchestration capabilities (for example, Airflow or equivalent), including scheduling, backfills, retries, dependency management, and operational SLAs with end-to-end ownership across architecture, engineering standards, CI/CD, and operational stability in a regulated enterprise context.

Skills

Required

  • Java
  • Python
  • SQL
  • Data Modeling
  • Schema Design
  • Schema Evolution
  • Pipeline Orchestration (e.g., Airflow)
  • ETL/ELT
  • Transformation Frameworks (e.g., dbt)
  • Batch Processing
  • Streaming Data Processing
  • Search/Indexing Workflows (e.g., Elasticsearch)
  • Secure Engineering Fundamentals
  • AI-assisted software development tools

Nice to have

  • REST APIs
  • gRPC
  • Spring Batch
  • dbt
  • Airflow
  • Elasticsearch
  • OPA

What the JD emphasized

  • 5+ years applied experience
  • demonstrated recent experience as a lead Data Engineer building and operating curated datasets and production pipelines end-to-end
  • Strong hands-on proficiency in both Java and Python in production environments
  • Advanced SQL skills, with strong capability in data modeling, schema design, and schema evolution
  • proven experience with pipeline orchestration (for example, Airflow or equivalent), including operational controls (SLAs, alerting, retries, backfills)
  • Proven experience with transformation frameworks (for example, dbt or equivalent) and strong testing practices for transformations and data quality
  • Experience processing large-scale datasets with a clear track record of optimizing for performance, scalability, reliability, and cost
  • Experience designing and implementing large-scale batch processing jobs (for example, Spring Batch or equivalent enterprise batch frameworks)
  • Hands-on experience building and operating search/indexing workflows (for example, Elasticsearch) at scale with strong SDLC discipline: code reviews, unit/integration testing, CI/CD, release hygiene, and production support ownership.
  • Secure engineering fundamentals: authentication/authorization, secrets management, least privilege, secure coding, and policy enforcement patterns (including familiarity with OPA or similar policy-as-code approaches) with strong communication and cross-functional leadership across engineering, product, UX, platform, and control partners.
  • Demonstrated experience leading effective use of approved AI-assisted software development tools (e.g., for coding, code review, test acceleration, troubleshooting) with the ability to set team expectations for validating AI outputs for correctness, performance, and security.
  • Strong understanding of responsible AI use in engineering workflows, including data sensit