Senior Site Reliability Engineer

Replit Replit · Enterprise · EUROPE · Remote · Engineering

Senior Site Reliability Engineer to ensure the reliability, scalability, and performance of Replit's infrastructure. Responsibilities include designing and implementing observability solutions, driving automation and infrastructure as code, establishing SLOs/SLIs, incident management, and performance optimization.

What you'd actually do

  1. Design and Implement Observability Solutions
  2. Drive Automation and Infrastructure as Code
  3. Establish SLOs and SLIs
  4. Incident Management and Response
  5. Performance Optimization

Skills

Required

  • Site Reliability Engineering
  • DevOps
  • Systems Engineering
  • Infrastructure Engineering
  • Python
  • Go
  • Distributed systems
  • Kubernetes
  • Cloud-native technologies
  • Monitoring
  • Observability
  • Incident management
  • Infrastructure as code
  • Configuration management

Nice to have

  • Google Cloud Platform (GCP)
  • Prometheus
  • Grafana
  • Datadog

What the JD emphasized

  • 4-8 years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering)
  • Strong programming skills in languages commonly used for automation (Python, Go, or similar)
  • Deep understanding of distributed systems
  • Experience with container orchestration platforms (Kubernetes) and cloud-native technologies
  • Proven track record of implementing and maintaining monitoring/observability solutions
  • Strong incident management skills with experience leading incident response
  • Experience with infrastructure as code and configuration management tools