Senior Site Reliability Engineer

Autodesk Autodesk · Enterprise · Singapore

This role is for a Senior Site Reliability Engineer (SRE) focused on building and optimizing a cloud-based platform. Responsibilities include developing secure, high-performance cloud services, enhancing system design, improving team processes, and participating in on-call rotations. A key aspect is developing AI-based solutions to improve reliability, efficiency, and productivity, and experience with AIOps and AI-driven tools is preferred.

What you'd actually do

  1. Develop and maintain secure, high-performance cloud services
  2. Collaborate with architects, designers, engineers and key stakeholders to translate requirements into product features and capabilities
  3. Enhance system design and architecture with your cloud expertise throughout the development lifecycle
  4. Improve team processes to meet business needs efficiently
  5. Review services, assess implementations, and recommend improvements
  6. Develop AI-based solutions to boost reliability, efficiency and productivity
  7. Participate in on-call rotations for production support

Skills

Required

  • BS or MS in Computer Science or related technical field or relevant experience
  • 6 to 10 years of hands-on experience with cloud services and applications
  • Demonstrated ability to solve problems and work with complex systems
  • Understanding of data structures, algorithms, and programming
  • Practical experience with AWS or other major Cloud providers
  • Proficiency in Infrastructure as Code (IaC) tools such as Terraform or CloudFormation
  • Knowledge of SRE principles and incident metrics
  • Experience designing and maintaining scalable, production-grade platforms and services

Nice to have

  • Experience with Cloud Platforms like AWS, Azure or GCP
  • Familiarity with databases such as MySQL, Redis, DynamoDB
  • Experience with APM, observability, logging and alerting tool (e.g. Dynatrace, NewRelic, DataDog, Splunk)
  • Proficient in Python or similar scripting languages
  • Skilled in building scalable, secure, observable and production-grade cloud services
  • Experience with AIOps and AI-driven tools for system reliability, automated incident response, and predictive maintenance
  • Experience working in a Scrum team and Agile setup

What the JD emphasized

  • AI-based solutions to boost reliability, efficiency and productivity
  • AIOps and AI-driven tools for system reliability, automated incident response, and predictive maintenance