Staff Backend Engineer - Application Core Services, Stacks | Usa | Remote

Grafana Labs Grafana Labs · Data AI · Canada, United States · Remote · R&D: Platform

Staff Backend Engineer responsible for designing, building, and operating systems that manage Grafana Cloud stacks at scale. This includes reconciliation systems, lifecycle workflows, and ensuring reliability and observability of customer environments. The role focuses on the intersection of product, platform, and business operations.

What you'd actually do

  1. Design, build, and operate reconciliation systems, including the SSS backend, to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration
  2. Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient
  3. Improve operational efficiency by reducing deployment complexity (e.g., aiming for single PR regional SSS deployment) and contributing to the Stack Config Reconciliation project
  4. Manage rollout mechanisms for provisioned plugins, dashboards, data sources, Grafana versions, release channels, and stack-level configuration
  5. Support new region and cluster rollouts, including the operational paths required to bring stacks online safely in new Grafana Cloud regions

Skills

Required

  • backend engineering
  • system design
  • distributed systems
  • reliability engineering
  • observability
  • workflow automation
  • cloud infrastructure
  • customer-facing systems

Nice to have

  • open-source contributions
  • experience with observability tools (Grafana, Mimir, Loki, Tempo)
  • experience with billing systems
  • experience with cloud marketplaces (AWS, Azure, GCP)

What the JD emphasized

  • You will help own the production behavior of the systems you build.