Senior Lead Site Reliability Engineer- Ets Network

JPMorgan Chase JPMorgan Chase · Banking · Jersey City, NJ +1 · Corporate Sector

This role focuses on Site Reliability Engineering (SRE) within an enterprise technology environment, specifically for Electronic Trading Services. The Senior Lead SRE will define non-functional requirements, ensure service level objectives are met, and mentor other engineers. A key aspect of the role involves leveraging and leading the adoption of enterprise-authorized AI capabilities to enhance reliability design, operational decisioning, and workflows across the SDLC, while ensuring safety, security, and auditability.

What you'd actually do

  1. Creates and delivers high quality designs, roadmaps, and program charters alongside the engineering team
  2. Acts as a key resource and mentor for technologists in your area seeking advice on technical and business issues, and serves as a culture carrier and site reliability adoption champion for your team
  3. Collaborates with others to create and implement observability and reliability designs for complex systems which are robust, stable, and do not incur additional toil or technical debt
  4. Uses enterprise-authorized AI capabilities within the work environment to accelerate reliability design and operational decisioning (e.g., incident/post-incident analysis and requirements traceability), validating outputs and handling operational data according to sensitivity and security requirements.
  5. Leads reuse-first adoption of AI-assisted reliability workflows across SDLC/toolchain practices (e.g., testing/validation automation and production readiness), ensuring traceability/auditability, resiliency, and security controls.

Skills

Required

  • Formal training or certification on site reliability engineering concepts and 5+ years applied experience
  • Advanced understanding of site reliability culture and principles
  • Deep knowledge in one or more areas of infrastructure engineering (hardware, networking, databases, storage, deployment, automation, scaling, resilience, performance)
  • Expertise in a specific infrastructure technology and scripting languages (e.g., Python)
  • Advanced knowledge and experience in observability (white and black box monitoring, service level objectives, alerting, telemetry collection)
  • Demonstrated experience using enterprise-authorized AI capabilities within the work environment to improve reliability engineering workflows with strong validation habits and awareness of data sensitivity.
  • Ability to set team practices for safe AI usage in operations (e.g., review/approval expectations and escalation paths) while maintaining resiliency, security, and auditability outcomes.
  • Commitment to developing technical and cross-functional knowledge beyond your product area
  • Advanced knowledge of software applications and technical processes
  • Demonstrated ability to communicate data-based solutions with complex reporting and visualization methods
  • Recognized as an active contributor of the engineering community

Nice to have

  • Experience with Arista, Cisco, F5, and Fortinet devices
  • Familiarity with network automation tools and techniques, such as Ansible
  • Experience with Corvil and Wireshark

What the JD emphasized

  • Uses enterprise-authorized AI capabilities within the work environment
  • Leads reuse-first adoption of AI-assisted reliability workflows
  • Demonstrated experience using enterprise-authorized AI capabilities within the work environment
  • Ability to set team practices for safe AI usage in operations