Lead Software Engineer - Java/python, Aws,llm

JPMorgan Chase JPMorgan Chase · Banking · Mumbai, Maharashtra, India · Commercial & Investment Bank

Lead Software Engineer role focused on building and scaling LLM-enabled solutions, including multi-agent workflows and RAG systems, within a regulated fintech environment. Responsibilities include designing, developing, and supporting LLM-powered applications, implementing guardrails, ensuring system reliability, and driving AI engineering best practices.

What you'd actually do

  1. Execute creative LLM assisted software solutions, design, develop, and troubleshoot LLM powered applications and services (e.g., retrieval augmented generation, agent workflows, structured extraction, classification) with a willingness to think beyond routine approaches to break down technical problems and deliver measurable outcomes and think in the novel Agentic AI way.
  2. Drives adoption and governance of approved AI-assisted engineering practices across teams to improve code quality, delivery speed, and operational outcomes (e.g., AI-assisted code review/refactoring, test acceleration, release readiness, incident/root-cause analysis), while establishing measurable validation standards (secure coding, peer review, automated testing) and promoting reuse of proven patterns and automation within the SDLC/TLM toolchain.
  3. Develop data quality rules and controls using LLM, define and enforce guardrails for prompts, retrieved context, model inputs/outputs, and post processing, including PII redaction, toxicity/safety filters, hallucination mitigation, output schema validation, and policy compliance.
  4. Provide Level 3 (L3) support for LLM assisted production systems, own complex incidents, model and prompt rollouts/rollbacks, dependency issues (vector stores, embeddings, feature stores), and ensure high availability, reliability, and adherence to SLAs including latency and cost budgets.
  5. Create secure, high quality production code: implement LLM assisted micro services, synchronous and asynchronous inference pipelines (streaming where appropriate), deterministic fallbacks, circuit breakers, and observability for reliability in production.

Skills

Required

  • Formal training or certification on software engineering concepts and 5+ years applied experience
  • Formal training or certification in software engineering concepts, with practical experience of minimum 1 year applying them to LLM enabled systems in regulated environments
  • Strong understanding of data modeling challenges in big data and LLM contexts, embeddings, chunking strategies, vector similarity nuances, retrieval quality measures, and document lineage.
  • Demonstrated experience leading effective use of enterprise-authorized AI-assisted software development tools within the work environment (e.g., for coding, code review, test acceleration, troubleshooting) with the ability to set team expectations for validating AI outputs for correctness, performance, and security
  • Strong understanding of responsible AI use in engineering workflows, including data sensitivity considerations, secure handling of inputs/outputs, and adherence to resiliency and security expectations; experience coaching senior engineers/leads on compliant usage patterns and controls.
  • Strong coding skills in Java/Python/Athena and SQL, applied to building LLM enabled micro services, retrieval pipelines, evaluators, and data tooling; solid understanding of data structures, algorithms, and object oriented programming as applied to LLM latency, caching, and throughput.
  • Hands on experience with AWS and cloud data management (e.g., Redshift, Dynamo DB, Aurora, Data bricks), plus experience integrating managed model endpoints and embedding/vector services; familiarity with secure secret management, networking, and least privilege access.
  • Proficiency in automation, CI/CD, and agile meth

Nice to have

  • AWS
  • cloud data management
  • Redshift
  • Dynamo DB
  • Aurora
  • Data bricks
  • managed model endpoints
  • embedding/vector services
  • secure secret management
  • networking
  • least privilege access
  • automation
  • CI/CD
  • agile

What the JD emphasized

  • regulated environment
  • LLM enabled solutions
  • multi agent workflows
  • retrieval augmented generation
  • agent workflows
  • structured extraction
  • classification
  • LLM powered applications
  • LLM assisted engineering practices
  • LLM assisted code review/refactoring
  • LLM assisted development
  • LLM
  • guardrails for prompts
  • retrieved context
  • model inputs/outputs
  • PII redaction
  • toxicity/safety filters
  • hallucination mitigation
  • output schema validation
  • policy compliance
  • LLM assisted production systems
  • vector stores
  • embeddings
  • feature stores
  • LLM use cases
  • LLM assisted micro services
  • inference pipelines
  • model cards
  • RAG/agent reference architectures
  • prompt libraries
  • evaluation plans
  • model strengths
  • limitations
  • risk profiles
  • LLMs
  • safety profiles
  • context limits
  • determinism strategies
  • fine tuning vs. prompt only tradeoffs
  • multi agent workflows
  • LLM driven analysis
  • code generation
  • testing
  • review
  • LLM driven systems
  • LLM Ops best practices
  • CI/CD
  • LLM enabled systems
  • regulated environments
  • data modeling challenges in big data and LLM contexts
  • embeddings
  • chunking strategies
  • vector similarity nuances
  • retrieval quality measures
  • document lineage
  • enterprise-authorized AI-assisted software development tools
  • responsible AI use
  • data sensitivity considerations
  • secure handling of inputs/outputs
  • resiliency and security expectations
  • compliant usage patterns and controls
  • LLM enabled micro services
  • retrieval pipelines
  • evaluators
  • data tooling
  • data structures
  • algorithms
  • object oriented programming as applied to LLM latency
  • caching
  • throughput
  • managed model endpoints
  • embedding/vector services

Other signals

  • LLM enabled solutions
  • LLM assisted components
  • multi agent workflows
  • retrieval augmented generation
  • agent workflows
  • structured extraction
  • classification
  • LLM powered applications
  • LLM assisted engineering practices
  • LLM assisted code review/refactoring
  • LLM assisted development
  • LLM
  • guardrails for prompts
  • retrieved context
  • model inputs/outputs
  • PII redaction
  • toxicity/safety filters
  • hallucination mitigation
  • output schema validation
  • policy compliance
  • LLM assisted production systems
  • vector stores
  • embeddings
  • feature stores
  • LLM use cases
  • LLM assisted micro services
  • inference pipelines
  • model cards
  • RAG/agent reference architectures
  • prompt libraries
  • evaluation plans
  • model strengths
  • limitations
  • risk profiles
  • LLMs
  • safety profiles
  • context limits
  • determinism strategies
  • fine tuning vs. prompt only tradeoffs
  • multi agent workflows
  • LLM driven analysis
  • code generation
  • testing
  • review
  • LLM driven systems
  • LLM Ops best practices
  • CI/CD
  • LLM enabled systems
  • regulated environments
  • data modeling challenges in big data and LLM contexts
  • embeddings
  • chunking strategies
  • vector similarity nuances
  • retrieval quality measures
  • document lineage
  • enterprise-authorized AI-assisted software development tools
  • responsible AI use
  • data sensitivity considerations
  • secure handling of inputs/outputs
  • resiliency and security expectations
  • compliant usage patterns and controls
  • LLM enabled micro services
  • retrieval pipelines
  • evaluators
  • data tooling
  • data structures
  • algorithms
  • object oriented programming as applied to LLM latency
  • caching
  • throughput
  • managed model endpoints
  • embedding/vector services