Senior Infrastructure Engineer - Sana

Workday Workday · Enterprise · Stockholm, Sweden +1

Workday is seeking a Senior Infrastructure Engineer for their Sana AI lab, which builds AI for work. The role focuses on building and maintaining the technical foundation for AI products, including cloud infrastructure, deployment systems, and developer tooling, with an emphasis on reliability and performance. The engineer will support feature development teams and act as a Site Reliability Engineer (SRE).

What you'd actually do

  1. Be the backbone of our ambitious goals, ensuring our infrastructure is robust and scalable.
  2. Support feature development teams with infrastructure decisions and deployments.
  3. Act as Site Reliability Engineer (SRE) and continuously enhance our Developer Experience (DX).
  4. Design and implement scalable cloud infrastructure solutions.

Skills

Required

  • GCP or equivalent cloud platforms
  • Kubernetes
  • Terraform
  • backend systems design
  • backend application coding
  • site reliability engineering
  • developer experience
  • CI/CD pipelines
  • deployment automation
  • observability
  • logging
  • metrics
  • alerting
  • incident response

Nice to have

  • TypeScript
  • Postgres
  • Redis

What the JD emphasized

  • 3+ years of experience in infrastructure, platform, or site reliability engineering.
  • 3+ years of hands-on experience with GCP or equivalent cloud platforms, including managing production environments.
  • 3+ years of experience with Kubernetes, including deployment, scaling, and operations.
  • 3+ years of experience with infrastructure-as-code tooling, such as Terraform.
  • 3+ years of experience designing and operating highly available, scalable, and reliable backend systems.
  • 2+ years of experience coding backend applications, with a strong track record of writing reliable, maintainable services.
  • Experience acting as an SRE, with a strong understanding of site reliability principles in production environments.
  • Experience improving developer experience through internal tooling, CI/CD pipelines, and deployment automation.
  • Strong observability skills, including logging, metrics, alerting, and incident response.