Principal Site Reliability Engineer

Autodesk Autodesk · Enterprise · Kraków, Poland

Principal Site Reliability Engineer for Autodesk PDMS Platform SRE team, focusing on architecting and implementing hosting solutions for dynamic SaaS web applications, ensuring reliability and performance at scale. Responsibilities include designing and maintaining Infrastructure-as-Code, implementing security and compliance, automating processes, defining and monitoring SLOs/SLIs, and participating in on-call support and incident management.

What you'd actually do

  1. Architect and implement hosting solutions for highly dynamic SaaS web applications, ensuring reliability and performance at scale.
  2. Design, implement, and maintain Infrastructure-as-Code solutions to support scalable, reliable, and secure global environments.
  3. Implement security and compliance with best practices across infrastructure and applications, including hardening, enforcing least privileges.
  4. Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and manage error budgets to ensure reliability goals are met.
  5. Participate in on-call support and incident management, ensuring timely resolution and clear communication.

Skills

Required

  • Linux administration
  • AWS
  • Docker
  • Kubernetes
  • Terraform
  • CloudFormation
  • CI/CD tools
  • log analysis
  • monitoring tools
  • relational databases
  • SQL
  • Bash
  • Python
  • Perl

Nice to have

  • AWS preferred

What the JD emphasized

  • 8+ years DevOps/SRE experience
  • Advanced hands-on experience Linux administration skills
  • Experience managing large-scale cloud infrastructure
  • Proven track record of architecting large-scale, highly available systems.
  • Expert-level knowledge of AWS services