Intermediate Site Reliability Engineer, Cloud Cost Utilization

GitLab GitLab · Enterprise · United Kingdom · Platforms Engineering

This role focuses on making cloud spending visible, understandable, and actionable by partnering with Engineering, Finance, and Product. It involves building and improving systems for tracking, attributing, and optimizing cloud usage, including developing resource tagging, improving billing data quality, and creating tooling for cost optimization across AWS and GCP. The role also involves contributing to observability systems to connect cost signals with reliability and operational data.

What you'd actually do

  1. Design and maintain cloud resource tagging and labeling strategies across GCP and AWS to support accurate cost attribution
  2. Develop tooling and pipelines to ingest, normalize, and report on cloud billing data using the FOCUS specification
  3. Automate cost anomaly detection, forecasting, and alerting so engineering teams can respond quickly to changes in infrastructure spend
  4. Contribute to GitLab's observability and monitoring stacks, including Prometheus, LGTM (Loki, Grafana, Tempo, and Mimir), and ELK, with a focus on surfacing cost efficiency signals
  5. Partner with Finance and Engineering leadership to support cloud cost forecasting for planning and budget discussions

Skills

Required

  • cloud cost management in GCP and/or AWS
  • billing data
  • pricing models
  • optimization approaches
  • infrastructure as code
  • Terraform
  • Ansible
  • observability tooling
  • Grafana
  • remote and asynchronous environment

Nice to have

  • FinOps FOCUS specification
  • connecting reliability and cost signals

What the JD emphasized

  • cloud cost management
  • FinOps FOCUS specification
  • cloud resource tagging and labeling strategies
  • observability tooling