Staff Software Engineer - Container Platform

Snowflake Snowflake · Data AI · CA-Menlo Park, United States · Engineering

Staff Software Engineer to build and operate a container platform for Snowflake's production, AI/ML, and CI workloads across multiple clouds. The role focuses on reliability, automation, and developer experience for internal engineers, managing hundreds of Kubernetes clusters.

What you'd actually do

  1. Own the design and delivery of large, complex platform initiatives spanning cluster lifecycle management, multi-cloud automation, and internal developer tooling.
  2. Identify and drive cross-team technical improvements across the platform, from architecture through adoption.
  3. Make and defend architectural trade-offs grounded in reliability, scalability, and operational reality.
  4. Act as a technical anchor for the team, developing expertise in others, raising engineering standards, and being the person engineers turn to on hard problems.
  5. Treat internal engineers as your primary customers and measure success by their velocity and the reliability of their experience on the platform.

Skills

Required

  • Go
  • Kubernetes
  • distributed systems
  • multi-cloud (AWS, Azure, GCP)

Nice to have

  • GPU infrastructure
  • AI/ML training workloads at scale
  • Open source contributions to Kubernetes or related ecosystem projects

What the JD emphasized

  • Significant experience designing and operating large-scale distributed systems in production.
  • Experience owning Kubernetes or similar orchestration systems in production at scale: cluster lifecycle, upgrades, and fleet management across a large heterogeneous fleet.
  • You've built platforms where your work quietly enabled hundreds or thousands of engineers to ship faster and more reliably.
  • You've made platform decisions with real blast radius: deprecating an API, rolling out a breaking change, or designing a self-service experience that teams actually want to use.