Senior Site Reliability Engineer

Honeycomb Honeycomb · Enterprise · Ireland · Remote · Engineering

Senior Site Reliability Engineer role focused on scaling backend systems, improving reliability, and enhancing developer experience within a distributed team at Honeycomb, an observability platform company. The role involves working with AWS, Kubernetes, Kafka, and other infrastructure tools, participating in incident command, and contributing to a cross-Atlantic engineering culture.

What you'd actually do

  1. Help Honeycomb scale our backend systems to support our highest-volume customers.
  2. Build organizational trust through transparent communication, giving and receiving direct and kind feedback.
  3. Work with other backend teams to dive deep into our stack to make sure we’re getting the most out of our infrastructure.
  4. Be trained, become, and then train others as an Incident Commander.
  5. Help SRE and Honeycomb develop a healthy cross-Atlantic engineering culture.

Skills

Required

  • AWS
  • Kubernetes
  • Cost analysis and reduction
  • Helm
  • Terraform
  • CI/CD
  • Project management
  • Software engineering
  • Kafka or another high-volume distributed system
  • Excellent written and spoken communication
  • Operating in ambiguity
  • Bias for action and experimentation
  • Geographically distributed teams

Nice to have

  • Golang
  • Performance engineering
  • Observability concepts (SLOs, instrumentation)
  • Data-driven decision making
  • Interest in both the technical and human sides of reliability engineering

What the JD emphasized

  • highest-volume customers
  • dive deep into our stack
  • Incident Commander
  • cross-Atlantic engineering culture