Principal Software Engineer: Resiliency

Cloudflare Cloudflare · Enterprise · Austin, TX, Washington, DC · Engineering

Principal Software Engineer role focused on building and managing the internal Control Plane for Cloudflare's SREs and Infrastructure Operations teams. The role involves developing tools to manage a large, globally distributed fleet of servers, storage, and network gear, contributing to the future of Cloudflare's infrastructure at scale. It also includes mentoring engineers and working on Health Mediated Deployment projects.

What you'd actually do

  1. work with several teams of passionate and talented engineers that are building the internal Control Plane used by our SREs, and Infrastructure Operations teams to manage our internal DCaaS and IaaS platforms.
  2. responsible for tools that support the management of a growing, globally distributed fleet of servers, storage, and network gear spread across over a thousand colos worldwide.
  3. play an active part in shaping the future of the infrastructure that propels Cloudflare’s scale and growth.
  4. write code to bring this design to fruition as well as to mentor high-potential engineers on their distributed system journey.
  5. deliver on the key Health Mediated Deployment projects that are being tracked through senior leadership of Cloudflare up to the founders.

Skills

Required

  • Minimum 10 years of experience working with distributed systems.
  • Experience designing, building and managing high volume software applications.
  • Expert in at least one modern strongly-typed programming language
  • Experience debugging, measuring, optimizing and identifying failure modes in a large-scale distributed system.
  • Excellent collaboration skills
  • Proven ability to convey ideas effectively through verbal and written communication
  • Ability to translate business needs into requirements, design documents and technical solutions
  • Knowledge of API design standards, patterns and best practices
  • Proven ability to use data to drive business outcomes
  • Proven experience in developing architects and lead engineers
  • Solid understanding of computer science fundamentals including data structures, algorithms, and object-oriented or functional design.

Nice to have

  • Experience with optimizing and scaling infrastructure provisioning, repair, and decommissioning processes and automations.
  • Experience with scaling and simplifying Configuration Management systems managing hundreds of thousands of nodes

What the JD emphasized

  • building the internal Control Plane
  • manage our internal DCaaS and IaaS platforms
  • tools that support the management of a growing, globally distributed fleet of servers, storage, and network gear