Site Reliability Engineer

Autodesk Autodesk · Enterprise · Bangalore, India

Site Reliability Engineer for Autodesk Customer Success Technical Advisory team, focusing on SaaS web applications. Responsibilities include architecting hosting solutions, implementing Infrastructure-as-Code, ensuring reliability and performance, managing cloud infrastructure (AWS preferred), and automating processes. The role involves defining and monitoring SLOs/SLIs, participating in on-call support, and conducting post-incident reviews. Familiarity with AI-assisted operations (AIOps) is mentioned as a minimum requirement.

What you'd actually do

  1. Architect and implement hosting solutions for highly dynamic SaaS web applications, ensuring reliability and performance at scale
  2. Design, implement, and maintain Infrastructure-as-Code solutions to support scalable, reliable, and secure global environments
  3. Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and manage error budgets to ensure reliability goals are met
  4. Design and maintain monitoring, logging, and observability frameworks to ensure full visibility into system health
  5. Participate in on-call support and incident management, ensuring timely resolution and clear communication

Skills

Required

  • DevOps/SRE experience
  • Linux administration
  • Incident management platforms
  • AI-assisted operations (AIOps)
  • Cloud infrastructure management (AWS)
  • Scripting (Bash, Python, Perl)
  • AWS services
  • Docker
  • Kubernetes
  • Infrastructure-as-code tools (Terraform, CloudFormation)
  • CI/CD tools
  • Log analysis and monitoring tools
  • Relational/vector databases
  • RAG
  • GraphQL
  • SQL
  • Problem-solving skills
  • Communication skills
  • Bachelor's degree in computer science or related field

Nice to have

  • passion for learning new technologies
  • desire to solve problems

What the JD emphasized

  • 5+ years DevOps/SRE experience with cloud-based applications
  • Advanced hands-on experience Linux administration skills, including monitoring, troubleshooting, reliability and security
  • Expert-level knowledge of AWS services (EC2, ECS, EKS, Lambda, ELB, S3, IAM, VPC, Dynamo DB, RDS, etc)
  • Hands-on experience with Docker, Kubernetes and container technologies
  • Proficiency with infrastructure-as-code tools (Terraform, CloudFormation)