What you'd actually do

Provide strategic leadership for multiple SRE teams, fostering a culture of reliability, automation, and continuous improvement.

Oversee design and delivery of highly scalable, fault-tolerant systems across cloud (AWS, GCP, Azure) and on-prem environments.

Implement advanced telemetry and monitoring practices, leveraging AI/ML for predictive insights and proactive reliability improvements.

Guide teams in automating infrastructure and CI/CD pipelines using tools such as Terraform, Ansible, Harness, GitLab, and Kubernetes.

Develop and execute departmental plans aligned with functional business objectives, ensuring resource optimization and financial integrity.

Skills

Required

Site Reliability Engineering
Systems Engineering
Leadership
Observability
Automation
Cloud platforms (AWS, GCP, Azure)
Container orchestration (Kubernetes)
Infrastructure-as-code (Terraform, CloudFormation)
CI/CD
DevOps practices
Communication skills
Stakeholder influence

Nice to have

Large-scale enterprise environments
FAANG-level engineering standards
Serverless architectures
Advanced container strategies
Organizational transformation
Scaling engineering teams
Advanced degree

What the JD emphasized

10+ years of progressive experience in Site Reliability Engineering, Systems Engineering, or related fields, including 4+ years in leadership roles managing multiple teams.

Proven ability to implement observability and reliability principles across complex, distributed systems.

Expertise in cloud platforms (AWS, GCP, Azure), container orchestration (Kubernetes), and infrastructure-as-code tools (Terraform, CloudFormation).

Strong background in CI/CD, automation, and modern DevOps practices.

Exceptional communication and leadership skills, with experience influencing senior stakeholders and driving cross-functional initiatives.

Job Posting Title:

Sr Mgr, Site Reliability Engineer (SRE)

Req ID:

10145210

Job Description:

At Disney, we’re storytellers. We make the impossible possible. The Walt Disney Company (TWDC) is a world-class entertainment and technological leader. Walt’s passion was to continuously envision new ways to move audiences around the world—a passion that remains our touchstone in an enterprise that stretches from theme parks, resorts and a cruise line to sports, news, movies and a variety of other businesses. Uniting each endeavor is a commitment to creating and delivering unforgettable experiences — and we’re constantly looking for new ways to enhance these exciting experiences.

Sr. Manager, Site Reliability Engineer provides strategic leadership across multiple SRE teams and their managers, ensuring alignment with organizational priorities and functional objectives while remaining accountable for performance and results related to observability, automation, and operational excellence across commerce platforms. Develops and executes departmental business, production, and organizational plans, allocating resources to achieve strategic goals. Leverages deep expertise in SRE principles to integrate reliability practices across engineering, security, and business functions, driving resilience and innovation. Influences senior internal and external stakeholders to secure funding, shape technology strategy, and champion reliability best practices.

This role sits within the Commerce Site Reliability Engineering organization in Technology & Digital for Disney Experiences. It works closely with leaders across Commerce and Consumer Products to ensure reliability, scalability, and security for critical systems, powering magical Guest experiences. This role will report to the Director of Site Reliability Engineering.

What You’ll Do

Lead & Inspire: Provide strategic leadership for multiple SRE teams, fostering a culture of reliability, automation, and continuous improvement.
Drive Operational Excellence: Oversee design and delivery of highly scalable, fault-tolerant systems across cloud (AWS, GCP, Azure) and on-prem environments.
Champion Observability: Implement advanced telemetry and monitoring practices, leveraging AI/ML for predictive insights and proactive reliability improvements.
Modernize Infrastructure: Guide teams in automating infrastructure and CI/CD pipelines using tools such as Terraform, Ansible, Harness, GitLab, and Kubernetes.
Strategic Planning: Develop and execute departmental plans aligned with functional business objectives, ensuring resource optimization and financial integrity.
Innovation & Integration: Evaluate emerging technologies and industry trends to inform strategic decisions and maintain competitive advantage.
People Leadership: Mentor and develop leaders, set clear OKRs, and promote a diverse and inclusive culture that encourages innovation and belonging.

Required Qualifications & Skills:

10+ years of progressive experience in Site Reliability Engineering, Systems Engineering, or related fields, including 4+ years in leadership roles managing multiple teams.
Proven ability to implement observability and reliability principles across complex, distributed systems.
Expertise in cloud platforms (AWS, GCP, Azure), container orchestration (Kubernetes), and infrastructure-as-code tools (Terraform, CloudFormation).
Strong background in CI/CD, automation, and modern DevOps practices.
Exceptional communication and leadership skills, with experience influencing senior stakeholders and driving cross-functional initiatives.
Comprehensive understanding of how SRE integrates with software development, security, and business operations.

Preferred Qualifications

Experience in large-scale enterprise environments and familiarity with FAANG-level engineering standards.
Knowledge of serverless architectures and advanced container strategies.
Demonstrated success in managing organizational transformation and scaling engineering teams.

Required Education

Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience). Advanced degree preferred.

#DISNEYTECH

The hiring range for this position in Orlando, Florida is $175,000-$215,000 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.

Job Posting Segment:

DX Technology

Job Posting Primary Business:

Commerce

Primary Job Posting Category:

Site/System Reliability Engineer

Employment Type:

Full time

Primary City, State, Region, Postal Code:

Orlando, FL, USA

Alternate City, State, Region, Postal Code:

Date Posted:

2026-03-18