Staff Backend Engineer - Application Core Services, Stacks | Canada | Remote

Grafana Labs Grafana Labs · Data AI · Canada, United States · Remote · R&D: Platform

Staff Backend Engineer responsible for designing, building, and operating systems that manage Grafana Cloud stacks at scale. This includes reconciliation systems, lifecycle workflows, and rollout mechanisms for plugins, dashboards, and configurations, ensuring reliability and operational efficiency in a customer-facing environment.

What you'd actually do

  1. Design, build, and operate reconciliation systems, including the SSS backend, to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration
  2. Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient
  3. Improve operational efficiency by reducing deployment complexity (e.g., aiming for single PR regional SSS deployment) and contributing to the Stack Config Reconciliation project
  4. Manage rollout mechanisms for provisioned plugins, dashboards, data sources, Grafana versions, release channels, and stack-level configuration
  5. Support new region and cluster rollouts, including the operational paths required to bring stacks online safely in new Grafana Cloud regions

Skills

Required

  • Backend engineering
  • System design
  • Distributed systems
  • Reliability engineering
  • Observability
  • Automation
  • Workflow management
  • Cloud infrastructure

Nice to have

  • Open source contributions
  • Experience with observability tools (Grafana, Mimir, Loki, Tempo)
  • Experience with billing systems
  • Experience with cloud marketplaces (AWS, Azure, GCP)

What the JD emphasized

  • customer-impacting stack lifecycle work
  • own the production behavior of the systems you build
  • debugging across service boundaries
  • making careful changes in systems that affect customer stacks