Sre, London

Apple Apple · Big Tech · London, United Kingdom +1 · Software and Services

Site Reliability Engineer for FoundationDB infrastructure at Apple, focusing on operating and scaling distributed systems across multiple data centers and cloud environments. Responsibilities include provisioning, managing, and monitoring FoundationDB, developing automation in Java and Go, and collaborating with development teams.

What you'd actually do

  1. Our team is responsible for the provisioning, managing, and monitoring of FoundationDB in production across multiple regions and control planes (bare-metal, AWS, and Kubernetes).
  2. We develop much of our own automation in Java and Go, including our open source Kubernetes Operator (https://github.com/FoundationDB/fdb-kubernetes-operator).
  3. We work closely with our dev partners to develop a robust and scalable database, often engaging in projects as a single team.

Skills

Required

  • Strong sense of ownership and integrity demonstrated through clear communication and collaboration
  • Experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment
  • The ability to design, author, and release code in languages like (but not limited to) Go, Java or Python
  • Acute drive to automate manual operations and to improve them through repeated iteration
  • Understanding of the Linux Operating System, standard networking protocols, and components
  • Hands-on experience managing large numbers of diverse systems with configuration management or software delivery platforms (such as Puppet, Chef, Ansible, and Spinnaker)
  • Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks
  • Excellent troubleshooting and problem solving skills
  • Experience with scale testing, disaster recovery, and capacity planning
  • Familiarity with microservices architecture and container orchestration with Kubernetes

Nice to have

  • Hands-on experience managing large numbers of diverse systems with configuration management or software delivery platforms (such as Puppet, Chef, Ansible, and Spinnaker)
  • Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks
  • Excellent troubleshooting and problem solving skills
  • Experience with scale testing, disaster recovery, and capacity planning
  • Familiarity with microservices architecture and container orchestration with Kubernetes