Manager Site Reliability Engineer

Workday Workday · Enterprise · Auckland, New Zealand

Workday is seeking an Engineering Manager for their Database Reliability Engineering team. The role focuses on leading a team to ensure the performance, scale, and high availability of Workday's data infrastructure, treating infrastructure as software and leveraging open-source and cloud-native solutions. The manager will architect the future of data infrastructure, moving beyond traditional DBA roles to automated, self-healing platforms, and will mentor senior engineers.

What you'd actually do

  1. lead our Database Reliability Engineering team
  2. architecting the future of our data infrastructure
  3. lead a team of high-performing engineers dedicated to the resiliency, security, and scalability of our data layer
  4. move beyond traditional DBA paradigms, replacing manual intervention with automated, self-healing platforms
  5. mentor and develop senior engineers, fostering a culture of psychological safety and high performance

Skills

Required

  • Experience leading SRE or Database Engineering teams
  • Experience in software or systems engineering
  • Experience as an SRE/DBRE
  • Designing resilient data infrastructure
  • Implementing automated failover mechanisms
  • Database internals (engine tuning, replication topologies, and query optimization)
  • Managing databases within Kubernetes using Operators or stateful sets
  • Spearheading high-stakes response for critical data outages
  • Reducing Mean Time to Resolution (MTTR)
  • Institutionalising RCA processes
  • Implementing robust observability stacks (Prometheus, Grafana, Datadog, or PMM)
  • Understanding of Agile/Scrum and Continual Improvement Process (CIP)
  • Managing SRE backlogs
  • Reducing "toil"
  • Automating manual database tasks
  • Mentoring and developing senior engineers
  • Leading deep-dive troubleshooting sessions (Linux internals, networking bottlenecks, distributed system latency)
  • Managing database workloads across AWS (RDS/Aurora or EC2) and GCP (Cloud SQL or GKE-hosted databases)

Nice to have

  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • Understanding of Team Performance concepts
  • Ability to contribute to improving team effectiveness

What the JD emphasized

  • 3+ years of experience leading SRE or Database Engineering teams
  • 8+ years of experience in software or systems engineering, with at least 4+ years as an SRE/DBRE
  • Technical Expertise in Database internals
  • 5+ years of experience spearheading high-stakes response for critical data outages
  • consistently reducing Mean Time to Resolution (MTTR)
  • institutionalising RCA processes to eliminate recurring systemic failures
  • Experience implementing robust observability stacks
  • Strong understanding of Agile/Scrum and Continual Improvement Process (CIP)
  • Proven ability to lead deep-dive troubleshooting sessions involving Linux internals, networking bottlenecks, and distributed system latency
  • Proven experience managing database workloads across AWS (RDS/Aurora or EC2) and GCP (Cloud SQL or GKE-hosted databases)