Lead Software Engineer - Data Platform

JPMorgan Chase JPMorgan Chase · Banking · Austin, TX +1 · Commercial & Investment Bank

Lead Software Engineer for a Data Platform team within Commercial & Investment Banking – Data Analytics – Payments Technology at JPMorgan Chase. Responsibilities include designing, building, and maintaining scalable data pipelines, ETL/ELT workflows, and data platform components. The role requires experience with distributed data processing frameworks, data modeling, cloud data services, and Kubernetes. Preferred qualifications include experience with Agentic AI, LLMs, RAG, and vector databases.

What you'd actually do

  1. Designs, builds, and maintains scalable data pipelines and ETL/ELT workflows for batch and real-time processing using Spark, Airflow, Kafka, and Flink
  2. Develops data platform components including data cataloging, data quality frameworks, and semantic/metrics layers with embedded governance, lineage, and compliance standards
  3. Implements data modeling strategies (fact and dimensional, wide tables) to support analytics, reporting, and downstream consumption
  4. Partners with analytics teams, product managers, and business stakeholders to translate data requirements into production-grade solutions
  5. Leads evaluation sessions with external vendors, startups, and internal teams to drive outcomes-oriented probing of architectural designs, technical credentials, and applicability for use within existing systems and information architecture

Skills

Required

  • software engineering concepts
  • system design
  • application development
  • testing
  • operational stability
  • data platforms on Kubernetes
  • Apache Iceberg
  • Unity Catalog
  • OpenMetadata
  • Python
  • Java
  • SQL
  • Apache Spark
  • Apache Flink
  • data modeling techniques
  • query optimization
  • Databricks
  • Apache Airflow
  • AWS S3
  • AWS Glue
  • AWS Redshift
  • AWS Athena
  • AWS EMR
  • AWS Lake Formation
  • Software Development Life Cycle
  • agile methodologies
  • CI/CD
  • Application Resiliency
  • Security
  • cloud
  • artificial intelligence
  • machine learning
  • mobile

Nice to have

  • Agentic AI
  • LLMs
  • RAG architectures
  • vector databases
  • embedding-based retrieval systems
  • Backstage
  • data mesh
  • data product architectures
  • Infrastructure as Code
  • Terraform
  • Docker
  • Kubernetes
  • data observability
  • data quality
  • metadata management tools
  • semantic layers
  • metrics stores
  • BI platforms
  • Tableau
  • dbt Metrics

What the JD emphasized

  • Experience engineering production-grade data platforms on Kubernetes with open catalog integration (e.g., Apache Iceberg, Unity Catalog, OpenMetadata) for scalable data discovery, lineage, and governance.