Senior Platform Engineer (cloud Platform)

Amplitude Amplitude · Data AI · San Francisco, CA · Engineering : Infrastructure

Senior Platform Engineer to build and evolve the cloud platform for an AI analytics company. The role focuses on making Kubernetes and AWS infrastructure effortless for engineers and AI agents, improving developer experience, reliability, and security. Responsibilities include leading platform projects, building AI-augmented tooling, owning IaC for Kubernetes/AWS/GCP, evolving CI/CD, driving observability, and reducing toil.

What you'd actually do

  1. Lead high-impact platform projects — design and ship capabilities that move the needle on developer experience, reliability, or security, and set the bar for quality, testing, and safe deployment practices.
  2. Build the AI-augmented platform. Design tooling and workflows that help engineers get more out of AI-assisted development — think infra primitives that are easy to reason about, automated review, and policy-as-code that keeps the guardrails strong as AI shifts how code gets written.
  3. Own Infrastructure-as-Code for Kubernetes, AWS, and GCP using Terraform, Helm, Kustomize, and emerging tooling — and make it consumable enough that an LLM can safely PR against it.
  4. Evolve our CI/CD backbone (Argo CD / Workflows / Rollouts, GitHub Actions) to make deploys faster, safer, and easier to reason about.
  5. Instrument and operate. Drive observability with Datadog and Amplitude, own dashboards and SLOs, and use the data to push reliability forward.

Skills

Required

  • 5+ years of experience in software engineering, DevOps, or Site Reliability Engineering
  • Production experience operating Kubernetes (EKS, GKE, AKS, or on-prem) and containerized applications at meaningful scale
  • Proficiency in at least one programming language (Golang or Python preferred)
  • Proficiency in IaC tooling (Terraform)
  • Working knowledge of AWS core services (EC2, EKS, IAM, VPC, ALB, S3)
  • Working knowledge of networking/security fundamentals
  • Familiarity with GitOps workflows
  • Familiarity with the CNCF ecosystem (Argo, Helm, Backstage, Envoy, and friends)
  • A track record of delivering projects that measurably improved reliability, performance, or developer productivity
  • Curiosity and conviction about AI as a force multiplier in infrastructure work

Nice to have

  • Golang
  • Python
  • Argo CD
  • Workflows
  • Rollouts
  • GitHub Actions
  • Datadog
  • Helm
  • Kustomize
  • Backstage
  • Envoy

What the JD emphasized

  • AI agents are first-class users
  • make it consumable enough that an LLM can safely PR against it
  • AI-augmented platform
  • AI-assisted development

Other signals

  • AI agents as first-class users
  • building platforms for AI agents
  • making Kubernetes effortless for AI agents
  • tooling for AI-assisted development