Software Engineer - Site Reliability

Workday Workday · Enterprise · Dublin, Ireland

Software Engineer focused on building reliable, scalable systems, software, and processes for the Workday Data Platform. The role involves designing, analyzing, and troubleshooting large-scale distributed systems, automating processes, and building infrastructure and tooling in the cloud using technologies like Spark, Hadoop, Kubernetes, AWS, GCP, Docker, Terraform, Ansible, Jenkins, Prometheus, and Grafana. The primary focus is on CI/CD, infrastructure as code, and ensuring efficient delivery and operation of software for other engineers.

What you'd actually do

  1. You have experience in designing, analyzing, and troubleshooting large-scale distributed systems build on technologies like Spark, YARN, Hadoop, Kubernetes, Polaris, Iceberg, Trino
  2. You love to work in Unix/Linux from kernel to shell, file systems, client-server protocols, etc.
  3. You have a strong coding background and can utilize various languages. We focus and build tooling and automation using Python, GoLang and Java.
  4. You prefer building infrastructure and tooling in the cloud and using managed services where possible, we focus on AWS and GCP
  5. You package and deliver immutable services and functions, utilizing Docker, Kubernetes and Serverless frameworks (AWS Lambda, API Gateway)

Skills

Required

  • 5+ years experience in software development engineering including designing, developing, and deploying software solutions
  • 2+ years coding experience and can utilize various languages (Python, GoLang and Java)
  • MS in Computer Science or related field and 1 years relevant experience or BS in Computer Science or related field and 3 years relevant experience
  • Experience in designing, analyzing, and troubleshooting large-scale distributed systems built on technologies like Spark, YARN, Hadoop, Kubernetes
  • Experience building infrastructure and tooling in the cloud and using managed services where possible, we focus on AWS
  • Working knowledge of building immutable services and functions utilizing Docker, Kubernetes and Serverless frameworks (AWS Lambda, API Gateway)
  • Working knowledge of building Highly Available, Scalable, Reliable multi-tenanted big data applications on Cloud (AWS, GCP) and/or Data Center architectures

Nice to have

  • Unix/Linux from kernel to shell, file systems, client-server protocols
  • Python, GoLang and Java
  • AWS and GCP
  • Docker, Kubernetes and Serverless frameworks (AWS Lambda, API Gateway)
  • Terraform, Ansible
  • Jenkins, TeamCity, Bamboo, Artifactory
  • Prometheus, Grafana
  • JVMs and have debugged and tuned them

What the JD emphasized

  • large-scale distributed systems
  • building infrastructure and tooling in the cloud
  • immutable services and functions
  • repeatable
  • Infrastructure as Code
  • CI/CD
  • meaningful metrics and alerts