Site Reliability Engineer I

Booking Booking · Hospitality · Bangalore, India · Engineering

This role focuses on treating operations as a software problem, emphasizing the reliability of systems and services by addressing availability, performance, scalability, latency, observability, and efficiency. The Site Reliability Engineer I will build software applications, design software systems, own services end-to-end, manage technical incidents, and focus on automation and toil reduction to improve system reliability and reduce operational costs.

What you'd actually do

  1. Has sufficient knowledge to build software applications by using relevant development languages and applying knowledge of systems, services and tools appropriate for the business area
  2. Has basic knowledge to evaluate possible architecture solutions by taking into account cost, business requirements, technology requirements and emerging technologies
  3. Has basic knowledge to own a service end to end by actively monitoring application health and performance, setting and monitoring relevant metrics and act accordingly when violated
  4. Has basic knowledge to address and resolve live production issues by mitigating the customer impact within SLA
  5. Has basic knowledge to ensure that infrastructure stays current by reducing technical debt, searching for bottlenecks and preparing for scaling

Skills

Required

  • software development languages
  • systems, services and tools
  • design patterns
  • testing techniques and methods
  • readable and reusable code
  • data security, integrity and quality
  • architecture solutions
  • infrastructure and architecture understanding
  • engineering techniques
  • monitoring application health and performance
  • metrics
  • business continuity risks
  • runbooks and OpDocs
  • continuous delivery and experimentation frameworks
  • deployment and operations
  • production management
  • live production issues
  • SLA management
  • root cause analysis
  • postmortem processes
  • technical debt reduction
  • bottleneck identification
  • scaling preparation
  • automation

Nice to have

  • commercial awareness
  • vendor partnerships