Senior Lead Software Engineer

JPMorgan Chase JPMorgan Chase · Banking · Jersey City, NJ +1 · Corporate Sector

Senior Lead Software Engineer to design, build, and optimize high-performance, low-latency distributed systems for AI/ML data platforms. Role involves architecting services, implementing infrastructure-as-code, and improving observability, while partnering with ML engineers and data scientists. Emphasis on owning production outcomes and driving adoption of AI-assisted engineering practices.

What you'd actually do

  1. Architects and implements low-latency, high-throughput Java Spring Boot based distributed services, using object-oriented principles, that meet the performance demands of production-grade services with strong well-defined APIs
  2. Designs and builds resilient, cloud-native service architectures with strong high-availability (HA) requirements, from 3 to 5 nines, leveraging standard AWS compute, messaging, streaming, DB and storage services like MSK (Kafka), SQS, S3, ECS, EKS, Lambda, KVS/KDS, RDS, Dynamo, Redshift, and S3.
  3. Develops and maintains infrastructure-as-code solutions using Terraform and/or CloudFormation to support scalable, repeatable, and auditable cloud deployments
  4. Implements and continuously improves observability solutions — including alerting, monitoring, and reporting — using Datadog, Dynatrace, and Splunk to deliver actionable production intelligence across microservices platforms
  5. Translates ambiguous or evolving requirements into stable, well-modeled service designs, clearly articulating engineering tradeoffs to both technical and non-technical stakeholders

Skills

Required

  • Java development skills
  • Spring Boot
  • object-oriented principles
  • distributed systems design
  • low-latency processing
  • AWS services (MSK, SQS, S3, ECS, EKS, Lambda, KVS/KDS, RDS, Dynamo, Redshift)
  • infrastructure-as-code (Terraform, CloudFormation)
  • observability solutions (Datadog, Dynatrace, Splunk)
  • API design
  • testing discipline
  • debugging
  • Python
  • Go
  • Rust
  • containerization (Docker)
  • orchestration (Kubernetes)
  • communication skills
  • AI-assisted software development tools
  • responsible AI use
  • data sensitivity
  • secure handling of inputs/outputs
  • resiliency and security

Nice to have

  • high-availability (HA) requirements
  • advanced usage of AWS managed services (KVS/KDS)
  • real-time processing
  • distributed event handling
  • efficient data storage and retrieval

What the JD emphasized

  • ownership of outcomes in production
  • stable, well-modeled service designs
  • meaningful latitude to influence architecture, engineering standards, and reliability posture
  • senior-level impact
  • high-performance, low-latency distributed systems
  • resilient, cloud-native solutions
  • reliability, scalability, and performance
  • architectural and engineering decisions
  • low-latency, high-throughput
  • strong well-defined APIs
  • resilient, cloud-native service architectures
  • strong high-availability (HA) requirements
  • scalable, repeatable, and auditable cloud deployments
  • continuously improves observability solutions
  • actionable production intelligence
  • stable, well-modeled service designs
  • engineering tradeoffs
  • engineering best practices
  • platform operability, reliability, and maintainability
  • Owns production outcomes end-to-end
  • performance bottlenecks, reliability gaps, and scalability constraints
  • robust, production-ready engineering solutions
  • culture of ownership, continuous learning, and engineering excellence
  • AI-assisted engineering practices
  • measurable validation standards
  • AI-assisted development and automation capabilities
  • 5+ years’ applied experience
  • very strong Java development skills
  • strong experience using Spring Boot
  • low-latency processing
  • production distributed systems
  • large-scale, resilient service architectures
  • production-grade environments
  • Strong engineering fundamentals
  • API design
  • testing discipline
  • debugging in production contexts
  • modern programming languages
  • heavy emphasis on Java
  • clean, maintainable, OO, and testable code
  • infrastructure-as-code solutions
  • scalable, repeatable, and auditable cloud deployments
  • containerization and orchestration technologies
  • Docker and Kubernetes
  • communicate engineering tradeoffs clearly
  • enterprise-authorized AI-assisted software development tools
  • validating AI outputs for correctness, performance, and security
  • responsible AI use in engineering workflows
  • data sensitivity considerations
  • secure handling of inputs/outputs
  • resiliency and security expectations
  • compliant usage patterns and controls