Sre / Devops Engineer (golang)

Workday Workday · Enterprise · Auckland, New Zealand

Workday is seeking a Site Reliability Engineer (SRE) / DevOps Engineer with GoLang experience to join their Cloud Platform SRE team in Auckland, New Zealand. This role focuses on ensuring the reliability, availability, and observability of the cloud platform that hosts Workday's engineering and customer environments. The team utilizes a cloud-native tech stack including Kubernetes, Istio, OPA, GoLang, Prometheus, and Grafana, and operates with scrum practices and a follow-the-sun on-call model. Responsibilities include identifying and resolving reliability issues, designing automation solutions, developing SLIs/SLOs, partnering with service teams on SRE standards, and contributing to incident response and root cause analysis. The ideal candidate has 1-8 years of experience in SRE/DevOps, hands-on experience with distributed systems in public cloud environments (AWS, GCP, Azure), and proficiency in GoLang, Python, or Ruby, with a preference for Go.

What you'd actually do

  1. Identify, diagnose, and resolve reliability and performance issues across distributed cloud environments, including Kubernetes clusters
  2. Design and implement automation solutions that improve operational efficiency, reduce manual effort, and enable the team to operate at scale
  3. Develop and launch effective SLIs to ensure that SLOs are achieved through building an extendable observability architecture, runbook automation, and establishing new processes
  4. Partner with platform service teams to craft and implement SRE standards for their respective services, defining benchmarks and automation to qualify services for production environments
  5. Collaborate with global SRE counterparts in Pleasanton, Atlanta, and Dublin to share knowledge, align on best practices, and deliver consistent operational outcomes

Skills

Required

  • site reliability engineering
  • DevOps
  • cloud-native infrastructure
  • distributed systems
  • public cloud environment (AWS, GCP, or Azure)
  • Kubernetes
  • GoLang
  • Linux/Unix
  • software development practices
  • iterative delivery approaches
  • agile methodologies
  • CI/CD pipelines

Nice to have

  • EKS
  • GKE
  • Istio
  • OPA
  • Prometheus
  • Grafana
  • Python
  • Ruby

What the JD emphasized

  • AI platform for managing people, money, and agents