Senior Site Reliability Engineer

Honeycomb Honeycomb · Enterprise · Ireland · Remote · Engineering

Senior Site Reliability Engineer role focused on scaling backend systems, improving reliability, and enhancing developer experience within a distributed team. The role involves working with AWS, Kubernetes, Kafka, and other infrastructure tools, participating in incident command, and contributing to a healthy engineering culture.

What you'd actually do

  1. Help Honeycomb scale our backend systems to support our highest-volume customers.
  2. Build organizational trust through transparent communication, giving and receiving direct and kind feedback.
  3. Work with other backend teams to dive deep into our stack to make sure we’re getting the most out of our infrastructure.
  4. Be trained, become, and then train others as an Incident Commander.
  5. Help SRE and Honeycomb develop a healthy cross-Atlantic engineering culture.

Skills

Required

  • AWS
  • Kubernetes
  • Cost analysis and reduction
  • Helm
  • Terraform
  • CI/CD
  • Project management
  • Software engineering
  • Kafka or another high-volume distributed system
  • Excellent written and spoken communication
  • Comfort operating in ambiguity
  • Interest in both the technical and human sides of reliability engineering
  • Experience working in geographically distributed teams
  • Familiarity with observability concepts (SLOs, instrumentation) and data-driven decision making

Nice to have

  • Golang
  • Performance engineering

What the JD emphasized

  • highest-volume customers
  • dive deep into our stack
  • Incident Commander
  • cross-Atlantic engineering culture
  • AWS
  • Kubernetes
  • cost analysis and reduction
  • Helm
  • Terraform
  • CI/CD
  • Golang
  • performance engineering
  • Kafka
  • high-volume distributed system
  • observability concepts
  • SLOs
  • instrumentation
  • data-driven decision making
  • geographically distributed teams