Senior Network Engineer

Cloudflare Cloudflare · Enterprise · Austin, TX · Engineering

Senior Network Engineer role focused on building AI-enabled tooling and agents for network operations, integrating LLM-powered tools with operational systems, and applying AI to accelerate troubleshooting. The role also involves traditional network engineering responsibilities like operating and architecting Cloudflare's global network.

What you'd actually do

  1. Own the technical operation, engineering, and architecture of Cloudflare's global network, including planning, installation, and day-to-day management of hardware and software across the edge and backbone.
  2. Serve as a hands-on operational anchor for the team — diagnosing and resolving complex network faults, owning incident response end-to-end including on-call rotation, and contributing to post-incident reviews to drive continuous improvement.
  3. Build production AI agents for network operations. Design, ship, and own LLM-powered tools that integrate with our operational systems of record via tool calling. Ship them with evals, observability of agent decisions, cost tracking, and human-in-the-loop checkpoints where autonomous action carries blast radius.
  4. Apply AI to accelerate troubleshooting. Use and extend our internal AI platform (Workers AI, AI Gateway) to speed up root cause analysis, pattern recognition across faults, and operational decision-making under pressure.
  5. Architect network improvements that lower latency, reduce packet loss, and increase scale — optimising end user experience across Cloudflare's global infrastructure.

Skills

Required

  • Proven track record in large-scale network engineering and operations.
  • Proficiency in Python and/or TypeScript and/or Go sufficient to build, debug, and maintain agent code, not just to glue scripts together.
  • Working knowledge of agent integration patterns: function/tool calling, MCP or equivalent, retrieval over operational corpora (runbooks, postmortems, change history), and prompt iteration with evals.
  • Experience reasoning about agent failure modes in production: hallucination guards, fallback paths, rollback, blast-radius control.
  • Deep expertise in BGP and anycast routing, with the ability to diagnose and resolve complex routing issues in a production environment.
  • Strong understanding of MPLS and Segment Routing.
  • Proficiency across multiple network vendor operating systems (Juniper, Cisco, Arista, or similar).
  • Experience with network automation frameworks such as SaltStack, Ansible, or equivalent, and a strong instinct for solving operational problems through code.
  • Ability to prioritise effectively and lead calmly when faced with high-pressure scenarios.
  • An effective communicator who can adapt their style to any audience — whether guiding a junior engineer through a fault or presenting a network architecture decision to senior leadership.

Nice to have

  • Shipped LLM-powered tooling into production use by a team other than your own, with measurable operational impact.
  • Professional-level network certification (JNCIP, CCNP, or equivalent or higher).
  • Experience with optical transport technologies such as CWDM and DWDM.
  • Linux system administration.
  • Experience writing network configuration and design documentation at an architectural level.

What the JD emphasized

  • Build production AI agents for network operations
  • LLM-powered tools
  • tool calling
  • evals
  • observability of agent decisions
  • cost tracking
  • human-in-the-loop checkpoints
  • autonomous action carries blast radius
  • Apply AI to accelerate troubleshooting
  • internal AI platform (Workers AI, AI Gateway)
  • agent integration patterns
  • function/tool calling
  • retrieval over operational corpora
  • prompt iteration with evals
  • reasoning about agent failure modes in production
  • hallucination guards
  • fallback paths
  • rollback
  • blast-radius control
  • Shipped LLM-powered tooling into production use by a team other than your own, with measurable operational impact

Other signals

  • AI-native curiosity
  • AI-enabled tooling and agents
  • production AI agents for network operations
  • LLM-powered tools
  • Apply AI to accelerate troubleshooting
  • internal AI platform (Workers AI, AI Gateway)