Service Reliability Engineer, G&a Solutions Engineering (gse)

Apple Apple · Big Tech · Austin, TX +1 · Software and Services

Service Reliability Engineer for Apple's G&A Solutions Engineering team, focusing on the reliability, scalability, and performance of critical production systems. Responsibilities include monitoring, incident response, automation, and collaboration with development teams.

What you'd actually do

  1. Proactively monitor service performance, identify potential bottlenecks, and implement solutions to optimize efficiency and resilience
  2. Lead incident response efforts, driving rapid resolution and conducting thorough root cause analysis (RCA)
  3. Develop and implement automation strategies to streamline operational tasks, improve service resilience, and reduce manual intervention
  4. Apply SRE principles to maintain highly reliable and scalable service infrastructure
  5. Collaborate closely with development teams to ensure that new services are designed for operational excellence, incorporating best practices for monitoring, alerting, and scalability

Skills

Required

  • 3+ years of experience in a Site Reliability Engineering, DevOps, or related role, supporting large-scale, enterprise-level services.
  • Strong proficiency in at least one programming language (e.g., Python, Java, Go) and scripting languages (e.g., Bash, PowerShell)
  • Experience with cloud platforms (e.g., AWS, Azure, GCP) and cloud-native technologies (e.g., Kubernetes, Docker).
  • Hands-on experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Splunk, Data dog)
  • Experience in RCA of technical issues
  • Bachelor's degree in Computer Science or work related experience

Nice to have

  • Proven ability to troubleshoot complex issues in distributed systems
  • Familiarity with CI/CD pipelines and DevOps practices
  • Experience with database technologies (e.g., MySQL, PostgreSQL, NoSQL databases)
  • Knowledge of ITIL frameworks and incident management processes
  • Understanding of Linux/Unix system administration
  • Experience with configuration management tools (Ansible, Chef, Puppet)