Lead Software Engineer - Java/python, Aws,llm

JPMorgan Chase · Banking · Mumbai, Maharashtra, India · Commercial & Investment Bank

Lead Software Engineer role focused on building and scaling LLM-enabled solutions, including multi-agent workflows and RAG systems, within a regulated fintech environment. Responsibilities include designing, developing, and supporting LLM-powered applications, implementing guardrails, ensuring system reliability, and driving AI engineering best practices.

What you'd actually do

Execute creative LLM assisted software solutions, design, develop, and troubleshoot LLM powered applications and services (e.g., retrieval augmented generation, agent workflows, structured extraction, classification) with a willingness to think beyond routine approaches to break down technical problems and deliver measurable outcomes and think in the novel Agentic AI way.
Drives adoption and governance of approved AI-assisted engineering practices across teams to improve code quality, delivery speed, and operational outcomes (e.g., AI-assisted code review/refactoring, test acceleration, release readiness, incident/root-cause analysis), while establishing measurable validation standards (secure coding, peer review, automated testing) and promoting reuse of proven patterns and automation within the SDLC/TLM toolchain.
Develop data quality rules and controls using LLM, define and enforce guardrails for prompts, retrieved context, model inputs/outputs, and post processing, including PII redaction, toxicity/safety filters, hallucination mitigation, output schema validation, and policy compliance.
Provide Level 3 (L3) support for LLM assisted production systems, own complex incidents, model and prompt rollouts/rollbacks, dependency issues (vector stores, embeddings, feature stores), and ensure high availability, reliability, and adherence to SLAs including latency and cost budgets.
Create secure, high quality production code: implement LLM assisted micro services, synchronous and asynchronous inference pipelines (streaming where appropriate), deterministic fallbacks, circuit breakers, and observability for reliability in production.

Skills

Required

Formal training or certification on software engineering concepts and 5+ years applied experience
Formal training or certification in software engineering concepts, with practical experience of minimum 1 year applying them to LLM enabled systems in regulated environments
Strong understanding of data modeling challenges in big data and LLM contexts, embeddings, chunking strategies, vector similarity nuances, retrieval quality measures, and document lineage.
Demonstrated experience leading effective use of enterprise-authorized AI-assisted software development tools within the work environment (e.g., for coding, code review, test acceleration, troubleshooting) with the ability to set team expectations for validating AI outputs for correctness, performance, and security
Strong understanding of responsible AI use in engineering workflows, including data sensitivity considerations, secure handling of inputs/outputs, and adherence to resiliency and security expectations; experience coaching senior engineers/leads on compliant usage patterns and controls.
Strong coding skills in Java/Python/Athena and SQL, applied to building LLM enabled micro services, retrieval pipelines, evaluators, and data tooling; solid understanding of data structures, algorithms, and object oriented programming as applied to LLM latency, caching, and throughput.
Hands on experience with AWS and cloud data management (e.g., Redshift, Dynamo DB, Aurora, Data bricks), plus experience integrating managed model endpoints and embedding/vector services; familiarity with secure secret management, networking, and least privilege access.
Proficiency in automation, CI/CD, and agile meth

Nice to have

AWS
cloud data management
Redshift
Dynamo DB
Aurora
Data bricks
managed model endpoints
embedding/vector services
secure secret management
networking
least privilege access
automation
CI/CD
agile

What the JD emphasized

regulated environment
LLM enabled solutions
multi agent workflows
retrieval augmented generation
agent workflows
structured extraction
classification
LLM powered applications
LLM assisted engineering practices
LLM assisted code review/refactoring
LLM assisted development
LLM
guardrails for prompts
retrieved context
model inputs/outputs
PII redaction
toxicity/safety filters
hallucination mitigation
output schema validation
policy compliance
LLM assisted production systems
vector stores
embeddings
feature stores
LLM use cases
LLM assisted micro services
inference pipelines
model cards
RAG/agent reference architectures
prompt libraries
evaluation plans
model strengths
limitations
risk profiles
LLMs
safety profiles
context limits
determinism strategies
fine tuning vs. prompt only tradeoffs
multi agent workflows
LLM driven analysis
code generation
testing
review
LLM driven systems
LLM Ops best practices
CI/CD
LLM enabled systems
regulated environments
data modeling challenges in big data and LLM contexts
embeddings
chunking strategies
vector similarity nuances
retrieval quality measures
document lineage
enterprise-authorized AI-assisted software development tools
responsible AI use
data sensitivity considerations
secure handling of inputs/outputs
resiliency and security expectations
compliant usage patterns and controls
LLM enabled micro services
retrieval pipelines
evaluators
data tooling
data structures
algorithms
object oriented programming as applied to LLM latency
caching
throughput
managed model endpoints
embedding/vector services

Other signals

LLM enabled solutions
LLM assisted components
multi agent workflows
retrieval augmented generation
agent workflows
structured extraction
classification
LLM powered applications
LLM assisted engineering practices
LLM assisted code review/refactoring
LLM assisted development
LLM
guardrails for prompts
retrieved context
model inputs/outputs
PII redaction
toxicity/safety filters
hallucination mitigation
output schema validation
policy compliance
LLM assisted production systems
vector stores
embeddings
feature stores
LLM use cases
LLM assisted micro services
inference pipelines
model cards
RAG/agent reference architectures
prompt libraries
evaluation plans
model strengths
limitations
risk profiles
LLMs
safety profiles
context limits
determinism strategies
fine tuning vs. prompt only tradeoffs
multi agent workflows
LLM driven analysis
code generation
testing
review
LLM driven systems
LLM Ops best practices
CI/CD
LLM enabled systems
regulated environments
data modeling challenges in big data and LLM contexts
embeddings
chunking strategies
vector similarity nuances
retrieval quality measures
document lineage
enterprise-authorized AI-assisted software development tools
responsible AI use
data sensitivity considerations
secure handling of inputs/outputs
resiliency and security expectations
compliant usage patterns and controls
LLM enabled micro services
retrieval pipelines
evaluators
data tooling
data structures
algorithms
object oriented programming as applied to LLM latency
caching
throughput
managed model endpoints
embedding/vector services

Read full job description

Be an integral part of an agile team that's constantly pushing the envelope to enhance, build, and deliver top-notch technology products

As a Lead Software Engineer at JPMorgan Chase within the Commercial & Investment Bank’s MACRO technology team, you are an integral member of an agile team building secure, stable, and scalable LLM enabled solutions. As a core technical contributor, you design and deliver controlled, well understood LLM assisted components and multi agent workflows across multiple business functions in support of the firm’s objectives in a regulated environment

Job responsibilities

Execute creative LLM assisted software solutions, design, develop, and troubleshoot LLM powered applications and services (e.g., retrieval augmented generation, agent workflows, structured extraction, classification) with a willingness to think beyond routine approaches to break down technical problems and deliver measurable outcomes and think in the novel Agentic AI way.
Drives adoption and governance of approved AI-assisted engineering practices across teams to improve code quality, delivery speed, and operational outcomes (e.g., AI-assisted code review/refactoring, test acceleration, release readiness, incident/root-cause analysis), while establishing measurable validation standards (secure coding, peer review, automated testing) and promoting reuse of proven patterns and automation within the SDLC/TLM toolchain.
Applies knowledge of tools within the Software Development Life Cycle toolchain, including approved AI-assisted development and automation capabilities, to improve the value realized by automation at scale.
Develop data quality rules and controls using LLM, define and enforce guardrails for prompts, retrieved context, model inputs/outputs, and post processing, including PII redaction, toxicity/safety filters, hallucination mitigation, output schema validation, and policy compliance.
Provide Level 3 (L3) support for LLM assisted production systems, own complex incidents, model and prompt rollouts/rollbacks, dependency issues (vector stores, embeddings, feature stores), and ensure high availability, reliability, and adherence to SLAs including latency and cost budgets.
Support BAU operations for Markets businesses: maintain and evolve LLM use cases supporting markets workflows with disciplined change management, canary releases, A/B tests, and close partnership with product, controls, and operations.
Create secure, high quality production code: implement LLM assisted micro services, synchronous and asynchronous inference pipelines (streaming where appropriate), deterministic fallbacks, circuit breakers, and observability for reliability in production.
Produce architecture and design artifacts, deliver model cards, system/data lineage, RAG/agent reference architectures, prompt libraries and versioning strategies, evaluation plans, and control evidence ensuring design constraints and regulatory expectations are met during development.
Identify hidden problems and patterns, use telemetry, error analysis, prompt and context analytics, and drift detection to improve model selection, prompt strategies, retrieval quality, chunking/embedding strategies, and system architecture.
Ensure that model strengths, limitations, and risk profiles are understood, documented, and appropriately applied across different classes of software work, and maintain deep understanding of the strengths, limitations, and risk characteristics of approved LLMs (e.g., Claude, ChatGPT, and successor models), including safety profiles, context limits, determinism strategies, and fine tuning vs. prompt only tradeoffs, design multi agent workflows that incorporate LLM driven analysis, code generation, testing, and review with explicit human approval gates and segregation of duties.
Ensure LLM driven systems meet enterprise reliability and resilience expectations, including disaster recovery, fallback behaviors, regional resiliency, and performance SLOs and Drive LLM Ops best practices, integrate models, prompts, and evaluation into CI/CD, enforce approvals, segregation of duties, and reproducibility, automate regression and guardrail tests and manage lifecycle across environments.

**Required qualifications, capabilities, and skills **

Formal training or certification on software engineering concepts and 5+ years applied experience
Formal training or certification in software engineering concepts, with practical experience of minimum 1 year applying them to LLM enabled systems in regulated environments and Strong understanding of data modeling challenges in big data and LLM contexts, embeddings, chunking strategies, vector similarity nuances, retrieval quality measures, and document lineage.
Demonstrated experience leading effective use of enterprise-authorized AI-assisted software development tools within the work environment (e.g., for coding, code review, test acceleration, troubleshooting) with the ability to set team expectations for validating AI outputs for correctness, performance, and security
Strong understanding of responsible AI use in engineering workflows, including data sensitivity considerations, secure handling of inputs/outputs, and adherence to resiliency and security expectations; experience coaching senior engineers/leads on compliant usage patterns and controls.
Strong coding skills in Java/Python/Athena and SQL, applied to building LLM enabled micro services, retrieval pipelines, evaluators, and data tooling; solid understanding of data structures, algorithms, and object oriented programming as applied to LLM latency, caching, and throughput.
Hands on experience with AWS and cloud data management (e.g., Redshift, Dynamo DB, Aurora, Data bricks), plus experience integrating managed model endpoints and embedding/vector services; familiarity with secure secret management, networking, and least privilege access.
Proficiency in automation, CI/CD, and agile methodologies with LLM Ops extensions: prompt and config versioning, automated evaluations, canary releases, and rollback strategies.
Experience in system design, application development, and operational stability for LLM architectures, including retrieval layers, vector stores, caching, observability, rate limiting, and backpressure strategies
Strong analytical, problem solving, and communication skills, including the ability to explain model behaviors, tradeoffs, and control decisions to both technical and non technical stakeholders.
Provide L3 and BAU support for Markets by leveraging LLMs for incident triage, run book retrieval, and pre approved auto remediation, with on call coverage for LLM services and dependencies.
Expert-level knowledge of how large language models work and hands-on experience training and fine-tuning approved models (e.g., Claude, Chat GPT and successors), with a proven track record integrating LLMs as controlled, reliable components of the software engineering lifecycle in regulated environments, ensuring determinism, reproducibility, safety, and traceability.

**Preferred qualifications, capabilities, and skills **

Define model usage guidelines outlining which models are appropriate for requirements analysis, code generation and refactoring, test generation, documentation and explanation, and lead the use of LLMs for structured requirements analysis, translating business and regulatory requirements into clear technical specifications and control implementations.
Establish best practices for prompt driven design and development, treating prompts and system instructions as versioned, reviewable engineering artifacts and ensuring change control and traceability, ensure prompt strategies support determinism, reproducibility, and traceability in regulated environments (e.g., seeded examples, constrained decoding, output schemas, and canonical evaluation sets), and oversee prompt libraries and reusable patterns aligned with enterprise coding and architectural standards, including shared retrieval components and guardrail policies.
Ability to continuously learn the new developments happening in Agentic AI and LLM driven software coding

Be an integral part of an agile team that's constantly pushing the envelope to enhance, build, and deliver top-notch technology products

Job responsibilities

Execute creative LLM assisted software solutions, design, develop, and troubleshoot LLM powered applications and services (e.g., retrieval augmented generation, agent workflows, structured extraction, classification) with a willingness to think beyond routine approaches to break down technical problems and deliver measurable outcomes and think in the novel Agentic AI way.
Drives adoption and governance of approved AI-assisted engineering practices across teams to improve code quality, delivery speed, and operational outcomes (e.g., AI-assisted code review/refactoring, test acceleration, release readiness, incident/root-cause analysis), while establishing measurable validation standards (secure coding, peer review, automated testing) and promoting reuse of proven patterns and automation within the SDLC/TLM toolchain.
Applies knowledge of tools within the Software Development Life Cycle toolchain, including approved AI-assisted development and automation capabilities, to improve the value realized by automation at scale.
Develop data quality rules and controls using LLM, define and enforce guardrails for prompts, retrieved context, model inputs/outputs, and post processing, including PII redaction, toxicity/safety filters, hallucination mitigation, output schema validation, and policy compliance.
Provide Level 3 (L3) support for LLM assisted production systems, own complex incidents, model and prompt rollouts/rollbacks, dependency issues (vector stores, embeddings, feature stores), and ensure high availability, reliability, and adherence to SLAs including latency and cost budgets.
Support BAU operations for Markets businesses: maintain and evolve LLM use cases supporting markets workflows with disciplined change management, canary releases, A/B tests, and close partnership with product, controls, and operations.
Create secure, high quality production code: implement LLM assisted micro services, synchronous and asynchronous inference pipelines (streaming where appropriate), deterministic fallbacks, circuit breakers, and observability for reliability in production.
Produce architecture and design artifacts, deliver model cards, system/data lineage, RAG/agent reference architectures, prompt libraries and versioning strategies, evaluation plans, and control evidence ensuring design constraints and regulatory expectations are met during development.
Identify hidden problems and patterns, use telemetry, error analysis, prompt and context analytics, and drift detection to improve model selection, prompt strategies, retrieval quality, chunking/embedding strategies, and system architecture.
Ensure that model strengths, limitations, and risk profiles are understood, documented, and appropriately applied across different classes of software work, and maintain deep understanding of the strengths, limitations, and risk characteristics of approved LLMs (e.g., Claude, ChatGPT, and successor models), including safety profiles, context limits, determinism strategies, and fine tuning vs. prompt only tradeoffs, design multi agent workflows that incorporate LLM driven analysis, code generation, testing, and review with explicit human approval gates and segregation of duties.
Ensure LLM driven systems meet enterprise reliability and resilience expectations, including disaster recovery, fallback behaviors, regional resiliency, and performance SLOs and Drive LLM Ops best practices, integrate models, prompts, and evaluation into CI/CD, enforce approvals, segregation of duties, and reproducibility, automate regression and guardrail tests and manage lifecycle across environments.

**Required qualifications, capabilities, and skills **

Formal training or certification on software engineering concepts and 5+ years applied experience
Formal training or certification in software engineering concepts, with practical experience of minimum 1 year applying them to LLM enabled systems in regulated environments and Strong understanding of data modeling challenges in big data and LLM contexts, embeddings, chunking strategies, vector similarity nuances, retrieval quality measures, and document lineage.
Demonstrated experience leading effective use of enterprise-authorized AI-assisted software development tools within the work environment (e.g., for coding, code review, test acceleration, troubleshooting) with the ability to set team expectations for validating AI outputs for correctness, performance, and security
Strong understanding of responsible AI use in engineering workflows, including data sensitivity considerations, secure handling of inputs/outputs, and adherence to resiliency and security expectations; experience coaching senior engineers/leads on compliant usage patterns and controls.
Strong coding skills in Java/Python/Athena and SQL, applied to building LLM enabled micro services, retrieval pipelines, evaluators, and data tooling; solid understanding of data structures, algorithms, and object oriented programming as applied to LLM latency, caching, and throughput.
Hands on experience with AWS and cloud data management (e.g., Redshift, Dynamo DB, Aurora, Data bricks), plus experience integrating managed model endpoints and embedding/vector services; familiarity with secure secret management, networking, and least privilege access.
Proficiency in automation, CI/CD, and agile methodologies with LLM Ops extensions: prompt and config versioning, automated evaluations, canary releases, and rollback strategies.
Experience in system design, application development, and operational stability for LLM architectures, including retrieval layers, vector stores, caching, observability, rate limiting, and backpressure strategies
Strong analytical, problem solving, and communication skills, including the ability to explain model behaviors, tradeoffs, and control decisions to both technical and non technical stakeholders.
Provide L3 and BAU support for Markets by leveraging LLMs for incident triage, run book retrieval, and pre approved auto remediation, with on call coverage for LLM services and dependencies.
Expert-level knowledge of how large language models work and hands-on experience training and fine-tuning approved models (e.g., Claude, Chat GPT and successors), with a proven track record integrating LLMs as controlled, reliable components of the software engineering lifecycle in regulated environments, ensuring determinism, reproducibility, safety, and traceability.

**Preferred qualifications, capabilities, and skills **

Define model usage guidelines outlining which models are appropriate for requirements analysis, code generation and refactoring, test generation, documentation and explanation, and lead the use of LLMs for structured requirements analysis, translating business and regulatory requirements into clear technical specifications and control implementations.
Establish best practices for prompt driven design and development, treating prompts and system instructions as versioned, reviewable engineering artifacts and ensuring change control and traceability, ensure prompt strategies support determinism, reproducibility, and traceability in regulated environments (e.g., seeded examples, constrained decoding, output schemas, and canonical evaluation sets), and oversee prompt libraries and reusable patterns aligned with enterprise coding and architectural standards, including shared retrieval components and guardrail policies.
Ability to continuously learn the new developments happening in Agentic AI and LLM driven software coding