Senior Director of Site Reliability Engineering

JPMorgan Chase JPMorgan Chase · Banking · Palo Alto, CA +1 · Corporate Sector

Senior Director of Site Reliability Engineering to lead firmwide adoption of AI capabilities in reliability operations, setting guardrails for AI-assisted and agentic workflows, and ensuring safe and scalable implementation within a regulated financial environment.

What you'd actually do

  1. Manages all team members development by ensuring they have access to resources needed for their unique development and collaborates across the firm to align team members for mobility opportunities in line with their career aspirations
  2. Leads firmwide reuse-first adoption of enterprise-authorized AI capabilities within the work environment to accelerate reliability planning, operational learning, and delivery execution, with human-in-the-loop validation and appropriate handling of sensitive data
  3. Applies a wide range of tactics and strategies to guide internal executive decisions to achieve substantial goals
  4. Manages multiple stakeholders and complex projects and teams
  5. Implements innovative methods, techniques, and evaluation criteria for projects and people working on highly complex business issues

Skills

Required

  • Formal training or certification on site reliability engineering concepts and 10+ years applied experience
  • 5+ years of experience leading technologists to manage, anticipate and solve complex technical items within your domain of expertise
  • Experience leading technologists to manage, anticipate, and solve complex technological issues firmwide
  • Demonstrated experience leading safe adoption of enterprise-authorized AI capabilities within the work environment at firm scale, including validation practices, data sensitivity considerations, and measurable reliability outcomes
  • Ability to define governance and decision frameworks for AI-assisted and agentic workflows, including control boundaries, auditability, and human approval checkpoints aligned to resiliency, security, and operational risk expectations
  • Experience hiring, developing, and recognizing talent
  • Prior experience influencing across highly matrixed, complex organizations and delivering value at scale
  • Experience leading complex projects supporting site reliability engineering design, scaling, resilience, and system performance assessments

Nice to have

  • Experience developing or leading cross-functional teams of technologists
  • Experience with hiring, developing, and recognizing talent
  • Experience leading a product as a Product Owner or Product Manager
  • Practical cloud native experience
  • Expertise in Computer Science, Computer Engineering, Mathematics, or a related technical field
  • Experience working at code level

What the JD emphasized

  • Leads firmwide reuse-first adoption of enterprise-authorized AI capabilities within the work environment to accelerate reliability planning, operational learning, and delivery execution, with human-in-the-loop validation and appropriate handling of sensitive data
  • Sets enterprise guardrails for AI-assisted and agentic workflows in reliability operations and delivery (e.g., approval controls, traceability/auditability, monitoring, and rollback expectations) aligned to resiliency, security, and risk standards
  • Demonstrated experience leading safe adoption of enterprise-authorized AI capabilities within the work environment at firm scale, including validation practices, data sensitivity considerations, and measurable reliability outcomes
  • Ability to define governance and decision frameworks for AI-assisted and agentic workflows, including control boundaries, auditability, and human approval checkpoints aligned to resiliency, security, and operational risk expectations

Other signals

  • leading firmwide reuse-first adoption of enterprise-authorized AI capabilities
  • Sets enterprise guardrails for AI-assisted and agentic workflows
  • Demonstrated experience leading safe adoption of enterprise-authorized AI capabilities at firm scale
  • Ability to define governance and decision frameworks for AI-assisted and agentic workflows