Site Reliability Engineer 5 - Live Sre

Netflix Netflix · Big Tech · United States · Remote · Engineering

Site Reliability Engineer focused on live streaming events, managing cloud traffic, load testing, and implementing end-to-end observability to ensure availability at scale.

What you'd actually do

  1. Drive continual improvement in observability, monitoring, and scalability with the primary goal to solve the thundering herd problem with cloud traffic (API gateway, IPC between microservices) for live streaming.
  2. Implement, automate, execute, and analyze the results from a broad range of live streaming delivery focused functional, performance, resilience, and fault injection testing.
  3. Write and review code, develop documentation, and debug complex problems between systems and components.
  4. Coordination, collaboration, and partnership across multiple stakeholders for the smooth execution of live-streaming events
  5. Participate in an on-call rotation and be able to work with flexible hours based on the live events schedule

Skills

Required

  • service reliability/operational experience running large scale, high performance systems & internet services
  • L4 Load Balancer, HTTP cache, and reverse proxy technologies
  • Unix or Linux systems
  • TCP/IP network fundamentals
  • networking principles, transport, and application protocols, especially DNS, TLS, and HTTP(s)
  • Go, Python, Rust
  • real time and BigData analytics processing technologies (Kafka, time series database and Presto/Trino, Spark SQL etc)

Nice to have

  • B.S. in Computer Science, Electrical or Computer Engineering (or equivalent professional experience)

What the JD emphasized

  • traffic at scale
  • live streaming