Senior Site Reliability Engineer

Algolia Algolia · Enterprise · Paris, France · Production Engineering

This role is for a Senior Site Reliability Engineer at Algolia, a company specializing in AI Search. The primary focus is on operating and improving the reliability, scalability, and cost-efficiency of Algolia's Search products. Responsibilities include building automated incident response, improving system reliability and performance, monitoring SLOs, reducing toil, and managing incidents. The role requires experience in operating scalable architectures, programming skills, API experience, cloud provider familiarity (GCP, AWS, Azure), and Linux system administration.

What you'd actually do

  1. Operating the Search products, building self-healing and automated incident response mechanisms
  2. Building components that improve reliability and performance
  3. Monitoring and computing the SLO and the error budget of the product you operate
  4. Reducing the toil and the technical debt by automating tasks and increasing the quality of existing components
  5. Managing Incidents and Customer Requests

Skills

Required

  • 5 years experience in a scalable environment
  • Knowledge of at least one programming language (Python, Golang, Ruby)
  • Familiarity with software craftsmanship
  • Experience working with APIs
  • A focus on designing reliable, operable, and highly available applications
  • Familiarity with at least Public Cloud Providers like GCP, AWS, or Microsoft Azure, and their Kubernetes service
  • A good understanding of Linux system administration, networking, and troubleshooting
  • Strong communication and organizational skills