Sr Mgr, Site Reliability Engineer (sre)

Disney Disney · Media · Orlando, FL +1

This role provides strategic leadership for multiple SRE teams, focusing on observability, automation, and operational excellence for commerce platforms. It involves overseeing the design and delivery of scalable, fault-tolerant systems across cloud and on-prem environments, modernizing infrastructure, and leading teams in CI/CD and DevOps practices. The role requires expertise in cloud platforms, container orchestration, and infrastructure-as-code tools, with a strong emphasis on people leadership and strategic planning.

What you'd actually do

  1. Provide strategic leadership for multiple SRE teams, fostering a culture of reliability, automation, and continuous improvement.
  2. Oversee design and delivery of highly scalable, fault-tolerant systems across cloud (AWS, GCP, Azure) and on-prem environments.
  3. Implement advanced telemetry and monitoring practices, leveraging AI/ML for predictive insights and proactive reliability improvements.
  4. Guide teams in automating infrastructure and CI/CD pipelines using tools such as Terraform, Ansible, Harness, GitLab, and Kubernetes.
  5. Develop and execute departmental plans aligned with functional business objectives, ensuring resource optimization and financial integrity.

Skills

Required

  • Site Reliability Engineering
  • Systems Engineering
  • Leadership
  • Observability
  • Automation
  • Cloud platforms (AWS, GCP, Azure)
  • Container orchestration (Kubernetes)
  • Infrastructure-as-code (Terraform, CloudFormation)
  • CI/CD
  • DevOps practices
  • Communication skills
  • Stakeholder influence

Nice to have

  • Large-scale enterprise environments
  • FAANG-level engineering standards
  • Serverless architectures
  • Advanced container strategies
  • Organizational transformation
  • Scaling engineering teams
  • Advanced degree

What the JD emphasized

  • 10+ years of progressive experience in Site Reliability Engineering, Systems Engineering, or related fields, including 4+ years in leadership roles managing multiple teams.
  • Proven ability to implement observability and reliability principles across complex, distributed systems.
  • Expertise in cloud platforms (AWS, GCP, Azure), container orchestration (Kubernetes), and infrastructure-as-code tools (Terraform, CloudFormation).
  • Strong background in CI/CD, automation, and modern DevOps practices.
  • Exceptional communication and leadership skills, with experience influencing senior stakeholders and driving cross-functional initiatives.