Systems Engineer, Metrics and Alerting

Cloudflare Cloudflare · Enterprise · London, United Kingdom · Engineering

Cloudflare is seeking a Systems Engineer for their internal Observability Team in London. The role focuses on designing, delivering, and operating software and platforms for metrics and alerting, solving scaling bottlenecks in critical services, and working on highly distributed systems. The position involves contributing to open-source projects and participating in a global on-call rotation. The ideal candidate has a Software Engineering background with proficiency in Go, data structures, distributed Linux environments, high-scale distributed systems, and tools like Prometheus and Alertmanager.

What you'd actually do

  1. Design, deliver, and operate software and a platform that progresses Cloudflare's Observability competency
  2. Solve scaling bottlenecks in critical services in our Metrics & Alerting pipeline
  3. Work on highly distributed and scalable systems
  4. Participate in the constant cycle of knowledge sharing and mentoring
  5. Participate in the global on-call rotation for the services your team owns

Skills

Required

  • Software Engineering background
  • proficiency in high-level programming languages (e.g., Go)
  • Proficiency in Data structures and databases like TSDBs, Columnar stores or related
  • Proficiency in distributed Linux environments
  • Proficiency in designing high-scale distributed systems
  • Proficiency in Prometheus, Alertmanager, Thanos
  • Experience working in a fast, high-growth environment
  • Experience working in a 24/7/365 service environment
  • Exquisite written and verbal communication skills
  • Familiarity with Internetworking, networking protocols Layer 2-7 of the OSI model and BGP
  • Strong bias for action

Nice to have

  • Experience with high-bandwidth transit Internetworking and routing
  • Passion for code simplicity and performance

What the JD emphasized

  • AI-native curiosity
  • leveraging AI to ship faster