Senior Infrastructure Engineer, Cloud

Webflow Webflow · Enterprise · Argentina · Remote · Engineering

Senior Infrastructure Engineer to improve reliability and stability of Webflow's customer-facing production infrastructure. This role will own and evolve the cloud substrate, design and maintain the networking fabric, build and enforce guardrails, drive FinOps, and partner on capacity planning. A key responsibility is to build and maintain AI-powered automation for infrastructure management, including policy-as-code, drift detection, and LLM-assisted runbook generation. The role requires experience with AWS, Kubernetes, and infrastructure-as-code tools, with a strong emphasis on automation and a proactive embrace of AI.

What you'd actually do

  1. Own and evolve the cloud substrate that Webflow's product and engineering teams depend on, including our compute layer, EKS fleet, networking, and cloud operations across AWS and GCP.
  2. Design and maintain the networking fabric that connects Webflow's services, ensuring reliability, security, and scalability across our cloud environments.
  3. Build and enforce guardrails around IAM, SCPs, and scoping permissions that keep infrastructure secure and auditable without slowing engineers down.
  4. Drive FinOps across Webflow's cloud footprint, owning cost attribution, right-sizing recommendations, and surfacing waste before it becomes a problem.
  5. Build and maintain AI-powered automation that improves how we manage cloud infrastructure, from policy-as-code and drift detection to LLM-assisted runbook generation.

Skills

Required

  • 5+ years of experience owning and operating cloud infrastructure in a customer-facing environment
  • deep hands-on experience with AWS
  • experience managing Kubernetes clusters at scale
  • experience with infrastructure-as-code tools like Pulumi or Terraform
  • experience navigating multi-region or multi-cloud environments on AWS or GCP
  • proactive embrace of AI

Nice to have

  • Experience with Karpenter, cluster autoscaler, or other Kubernetes-native scaling tooling
  • Experience with GCP infrastructure alongside AWS in a multi-cloud environment
  • Experience building AI-assisted infrastructure tooling, including cost optimization loops, anomaly detection, or policy-as-code with LLM assistance
  • Experience contributing to multi-region architecture including data residency, regional failover, or latency-based routing

What the JD emphasized

  • customer-facing, production infrastructure
  • little to no downtime
  • AI-powered automation
  • LLM-assisted runbook generation
  • building and applying fluency in emerging technologies

Other signals

  • AI-powered automation for infrastructure management
  • LLM-assisted runbook generation
  • building and applying fluency in emerging technologies