Staff Backend Engineer - Adaptive Telemetry | UK | Remote

Grafana Labs Grafana Labs · Data AI · EMEA, Germany, Spain, Sweden, United Kingdom · Remote · R&D : Databases

Staff Backend Engineer role at Grafana Labs, focusing on Adaptive Telemetry for metrics, logs, and traces. The role involves driving technical strategy, leading project delivery, owning system architecture, reliability, performance, and cost, defining SLOs/SLIs, improving observability and automation, aligning stakeholders, and mentoring engineers. The company uses AI coding assistants and provides access to frontier models for developer productivity, but the core role is backend engineering for observability systems, not AI model development.

What you'd actually do

  1. Drive technical strategy and roadmap. Proactively define the architectural vision, prioritize work that unlocks major product or platform improvements, and influence product and engineering decisions.
  2. Lead end-to-end delivery of large, cross-functional projects. Own planning, design, execution, rollout and long-term operation of large initiatives.
  3. Own architecture, reliability, performance and cost for critical systems. Make pragmatic architecture choices that balance scalability, availability, latency and cost while ensuring systems remain maintainable and evolvable.
  4. Define SLOs/SLIs and lead incident response. Establish measurable reliability targets, run high-severity incident response, lead blameless post-mortems, and drive systemic fixes and automation to prevent recurrence.
  5. Improve observability, automation and operational readiness. Champion telemetry, alerting, runbooks, capacity planning and automation efforts that reduce toil, speed debugging and lower MTTR.

Skills

Required

  • distributed systems
  • systems design
  • backend engineering
  • observability
  • reliability
  • performance optimization
  • cost optimization
  • incident response
  • automation
  • technical leadership
  • mentoring

Nice to have

  • experience shipping and operating complex systems that span multiple teams

What the JD emphasized

  • Proven delivery of large distributed systems
  • Strong systems-design instincts