Site Reliability Engineer

Apple Apple · Big Tech · Seattle, WA · Software and Services

Site Reliability Engineers with experience in developing processes, tools, and automation for managing distributed systems in production environments. Our SRE team combines software and systems engineering and system administration practices to build and run large-scale, massively distributed, fault-tolerant systems. Our software ensures that Apple’s services are reliable, scalable and secure, and we leverage both open source and home-grown technologies to provide managed data infrastructure services. You will help building next generation search infrastructure and platform services, collaborating cross-functionally with various ASE teams, from store and commerce to search and recommendations. You’ll create platforms that can rapidly scale to serve personalized and non-personalized data with very low latencies.

What you'd actually do

  1. Develop processes, tools, and automation for managing distributed systems in production environments.
  2. Build and run large-scale, massively distributed, fault-tolerant systems.
  3. Help building next generation search infrastructure and platform services.
  4. Create platforms that can rapidly scale to serve personalized and non-personalized data with very low latencies.
  5. Contribute to all major components of Redis deployment infrastructure, including maintenance automation, backup service application, monitoring and alerting tooling/dashboards, deployment architecture, focused on stability, performance, and scaling.

Skills

Required

  • Bachelor's Degree in Computer Science, an engineering-related field, or equivalent related experience.
  • 3 - 5 years in a Site Reliability Engineering focused role.
  • Proficient in one or more of the following programming languages: Java, Go (Golang), Python
  • Understanding of core SRE concepts - Monitoring, Alerting, Incident management.
  • Understanding of database concepts (consistency models, isolation levels, crash and recovery semantics).
  • Performance engineering (design concepts, profile-guided optimization).
  • Service management across Kubernetes, bare metal, and virtualized (EC2) platforms.
  • Datacenter architecture (networking topologies, host placement strategies, and failure modes); design of multi-datacenter systems; failure domains; and wide-area networking.

Nice to have

  • Demonstrated expertise developing distributed systems, storage engines, distributed systems, or performance engineering.
  • Experience developing critical internet services and/or platform infrastructure.
  • Experience managing services on Kubernetes
  • Experience with EC2, EBS, and Terraform
  • experience in this area is a plus

What the JD emphasized

  • excellent communication
  • high degree of customer focus
  • experience in this area is a plus
  • Prior experience with development or maintenance of distributed databases / storage systems is recommended
  • Apple values craftsmanship
  • Performance is a key ingredient
  • define the metrics
  • set targets
  • uncover optimization opportunities
  • define quality guardrails
  • ship a product/service that will delight our customers
  • engineers who enjoy deep technical engineering that spans large cross-organizational projects
  • Your willingness to learning and implementing new technologies will contribute to the continuous evolution of our organization.