Distributed Systems Engineer - Data Platform - Analytics and Alerts

Cloudflare Cloudflare · Enterprise · London, United Kingdom · Engineering

Cloudflare is seeking experienced engineers to join their DATA Org, focusing on building and maintaining data infrastructure for analytics and alerts. The role involves developing customer-facing APIs, near real-time alerting platforms, optimizing analytical queries, and scaling infrastructure. While the company mentions leveraging AI, the core responsibilities are in distributed systems, data pipelines, and scalable APIs, not direct AI/ML model development.

What you'd actually do

  1. Develop and enhance our customer-facing APIs focusing on performance, reliability, and an intuitive user experience.
  2. Design, build, and maintain our near real-time alerting platform, from data processing and anomaly detection to reliable notification delivery.
  3. Optimise the performance of complex analytical queries that power our APIs and dashboards, working closely with the database platform team.
  4. Create intuitive and powerful tools that allow customers to explore their data and configure meaningful alerts based on logs and metrics.
  5. Scale our API and alerting infrastructure to support a growing number of internal and external use cases.

Skills

Required

  • 3+ years of experience working in software development covering distributed systems and scalable APIs.
  • Strong programming skills (Go is preferable)
  • deep understanding of software development best practices for building performant, customer-facing services.
  • Hands-on experience with modern observability stacks, including Prometheus, Grafana
  • strong understanding of handling high-cardinality metrics at scale.
  • Strong knowledge of SQL, including extensive experience with complex query optimisation.
  • A solid foundation in computer science, including algorithms, data structures, distributed systems, and concurrency.
  • Strong analytical and problem-solving skills
  • willingness to debug, troubleshoot, and learn about complex problems at high scale.
  • Ability to work collaboratively in a team environment and communicate effectively with other teams across Cloudflare.

Nice to have

  • Go

What the JD emphasized

  • highly motivated engineers
  • complex distributed systems challenges
  • high-throughput, low-latency powerhouse
  • immense analytical workloads
  • near real-time alerting platform
  • complex analytical queries
  • Scale our API and alerting infrastructure
  • 3+ years of experience
  • distributed systems and scalable APIs
  • high-cardinality metrics at scale
  • complex query optimisation
  • algorithms, data structures, distributed systems, and concurrency
  • complex problems at high scale