Site Reliability Engineer 5

Netflix Netflix · Big Tech · Warsaw, Poland · Engineering

Netflix is hiring a Site Reliability Engineer 5 in Warsaw, Poland. The role focuses on improving the reliability and resilience of internal Netflix services through automation, observability, and proactive measures. Responsibilities include collaborating with engineering teams, developing automation tools, conducting capacity planning, participating in incident response, and improving monitoring and alerting systems. The role requires 3+ years of SRE experience, strong scripting skills (Python, Go, Java, or JavaScript/Node.js), experience with distributed systems, incident management, Infrastructure as Code (Terraform), container orchestration (Kubernetes, Docker), and cloud platforms (AWS).

What you'd actually do

  1. Collaborate with engineering and product teams to integrate observability, reliability, and security considerations into the entire software development lifecycle.
  2. Develop and implement automation tools for monitoring, deployment, and incident response to ensure efficient and reliable operations.
  3. Conduct or participate in capacity planning, performance analysis, and system tuning to optimize system reliability.
  4. Participate in on-call rotations and contribute to incident response, diagnosis, and resolution.
  5. Implement and improve monitoring and alerting systems to proactively identify and address potential issues.

Skills

Required

  • 3+ years of experience as a Site Reliability Engineer or in a similar role
  • Strong scripting and programming skills (Python, Go, Java, or JavaScript/Node.js)
  • Experience with complex sociotechnical systems and their successful operations at scale
  • Experience with incident management and response
  • Experience with Infrastructure as code, like Terraform, and container orchestration tools like Kubernetes, Docker
  • Experience with cloud platforms like AWS, microservices architecture, and enterprise software solutions like Slack & GSuite
  • Excellent communication & collaboration skills
  • Proven ability to troubleshoot complex issues and implement effective solutions

Nice to have

  • Familiarity with Human Factors Engineering
  • Ability to grow expertise, influence & educate others