Distributed Systems Engineer - Data Platform - Analytics and Alerts

Cloudflare Cloudflare · Enterprise · Austin, TX · Engineering

Cloudflare is seeking experienced engineers to join their DATA Org to build the future of data, focusing on ingestion, processing, storage, and retrieval for logs and analytics. This role specifically targets the Analytics and Alerts group, involving the development and enhancement of customer-facing APIs, building a near real-time alerting platform, optimizing analytical queries, and scaling infrastructure. The position requires strong programming skills (preferably Go), experience with distributed systems, scalable APIs, observability stacks (Prometheus, Grafana), and SQL with complex query optimization.

What you'd actually do

  1. Develop and enhance our customer-facing APIs focusing on performance, reliability, and an intuitive user experience.
  2. Design, build, and maintain our near real-time alerting platform, from data processing and anomaly detection to reliable notification delivery.
  3. Optimise the performance of complex analytical queries that power our APIs and dashboards, working closely with the database platform team.
  4. Create intuitive and powerful tools that allow customers to explore their data and configure meaningful alerts based on logs and metrics.
  5. Scale our API and alerting infrastructure to support a growing number of internal and external use cases.

Skills

Required

  • 3+ years of experience working in software development covering distributed systems and scalable APIs.
  • Strong programming skills (Go is preferable)
  • deep understanding of software development best practices for building performant, customer-facing services.
  • Hands-on experience with modern observability stacks, including Prometheus, Grafana
  • strong understanding of handling high-cardinality metrics at scale.
  • Strong knowledge of SQL
  • extensive experience with complex query optimisation.
  • A solid foundation in computer science, including algorithms, data structures, distributed systems, and concurrency.
  • Strong analytical and problem-solving skills
  • willingness to debug, troubleshoot, and learn about complex problems at high scale.
  • Ability to work collaboratively in a team environment and communicate effectively with other teams across Cloudflare.
  • Experience developing and scaling A

What the JD emphasized

  • highly motivated engineers
  • experienced engineers
  • complex distributed systems challenges
  • high-throughput, low-latency powerhouse
  • immense analytical workloads
  • near real-time alerting platform
  • complex analytical queries
  • high scale