Distributed Systems Engineer - Data Platform (delivery, Database, Retrieval)

Cloudflare Cloudflare · Enterprise · Austin, TX · Engineering

Cloudflare is seeking experienced engineers to join their DATA Org to build and maintain the future of data at Cloudflare. This role involves working on distributed systems across the entire data lifecycle, including ingestion, processing, storage, and retrieval, to power logs and analytics for customers. Responsibilities include building data delivery pipelines, analytical database platforms (ClickHouse), and customer-facing data retrieval products like GraphQL APIs and alerting solutions. The role requires strong programming skills (preferably Golang), experience with distributed systems, databases, and observability stacks, and a solid computer science foundation.

What you'd actually do

  1. Design, develop, and maintain scalable and reliable distributed systems across the entire data lifecycle.
  2. Build and optimise key components of our high-throughput data delivery platform to ensure data integrity and low-latency delivery.
  3. Develop new and improve existing components for the Cloudflare Analytical Platform to extend functionality and performance.
  4. Scale, monitor, and maintain the performance of our large-scale database clusters to accommodate the growing volume of data.
  5. Develop and enhance our customer-facing GraphQL APIs, log delivery, and alerting solutions, focusing on performance, reliability, and user experience.

Skills

Required

  • 3+ years of experience working in software development covering distributed systems and databases
  • Strong programming skills (Golang is preferable)
  • Deep understanding of software development best practices and principles
  • Hands-on experience with modern observability stacks, including Prometheus, Grafana
  • Strong knowledge of SQL and database internals, including experience with database design, optimisation, and performance tuning
  • A solid foundation in computer science, including algorithms, data structures, distributed systems, and concurrency
  • Strong analytical and problem-solving skills
  • Ability to work collaboratively in a team environment and communicate effectively with other teams across Cloudflare

Nice to have

  • Experience with ClickHouse
  • Experience with data streaming technologies (e.g., Kafka, Flink)

What the JD emphasized

  • high-cardinality metrics at scale