Lead Software Engineer - Data Engineer

JPMorgan Chase JPMorgan Chase · Banking · Bengaluru, Karnataka, India · Consumer & Community Banking

Lead Software Engineer for Data Engineering role focused on building an AI/Agentic AI layer for a Data Products platform. This involves architecting and developing autonomous agents for data product lifecycle management, leveraging NLP, RAG, Vector Databases, and multi-agent orchestration. The role also requires hands-on technical leadership, mentoring a team, and ensuring scalability, security, and best practices for enterprise-grade data platforms.

What you'd actually do

  1. Lead, mentor, and grow a high-performing team of 5 – 7 engineers across multiple workstreams, fostering a culture of innovation, ownership, and technical excellence.
  2. Operate as a player-coach — providing hands-on architectural guidance while empowering the team to own and deliver independently.
  3. Architect and own the end-to-end technical design of the Data Products Studio — a scalable, enterprise-grade platform that orchestrates the discovery, design, build, and productionization of data products from the CCB Data Lake and Snowflake.
  4. Design the platform's AI/Agentic AI layer, leveraging intent agents, NLP Text-to-SQL, Knowledge Graphs (KAG), RAG, Vector Databases, and Agent-to-Agent (A2A) communication to enable intelligent, automated data product creation and natural language interaction with the data estate.
  5. Lead the design and development of Agentic AI capabilities that power the Data Products Framework — including autonomous discovery agents that profile and recommend data product candidates, design agents that auto-generate data contracts and schema recommendations, build agents that generate and optimize data pipelines, governance agents that auto-apply entitlements based on data classification, and quality agents that detect anomalies, drift, and trigger self-healing remediation.

Skills

Required

  • Python
  • SQL
  • Java 17+
  • Spring
  • Boot
  • system design
  • distributed systems
  • ETL/ELT pipelines
  • batch and real-time data processing
  • PySpark
  • DataFrame API
  • Dataset API
  • Spark SQL
  • React
  • Angular
  • AWS cloud services
  • S3
  • Athena
  • Glue
  • Lambda
  • Step Functions
  • IAM
  • KMS
  • Terraform
  • Snowflake
  • LLMs
  • RAG architectures
  • Vector Databases
  • NLP
  • agentic frameworks
  • data governance
  • metadata management
  • data lineage
  • access control
  • data classification
  • policy enforcement
  • Grafana

Nice to have

  • Knowledge Graphs (KAG)
  • Agent-to-Agent (A2A) communication

What the JD emphasized

  • architecting and delivering large-scale, enterprise-grade data platforms or frameworks from concept through production in a large corporate environment
  • AI/ML-powered platforms or applications
  • agentic frameworks

Other signals

  • AI/Agentic AI layer
  • orchestrates the discovery, design, build, and productionization of data products
  • autonomous discovery agents
  • design agents
  • build agents
  • governance agents
  • quality agents
  • Agent-to-Agent communication layer
  • multi-agent orchestration
  • NLP Text-to-SQL