Senior Infrastructure Engineer, Observe by Snowflake

Snowflake Snowflake · Data AI · CA-Menlo Park, United States · Engineering

Snowflake is seeking a Senior Infrastructure Engineer to join their Observe by Snowflake team. This role will focus on architecting, scaling, and operating the cloud infrastructure (AWS) that powers their AI-powered observability platform. Responsibilities include leading design and build efforts, driving architectural improvements for reliability and performance, owning CI/CD pipelines, identifying and mitigating risks, and mentoring other engineers. The ideal candidate has 5+ years of experience in infrastructure/SRE/DevOps, strong Kubernetes/Nomad experience, proficiency with IaC tools like Terraform, and programming skills in Go or Python.

What you'd actually do

  1. Lead the design, build, and operation of scalable, secure cloud infrastructure in AWS supporting a high-scale observability platform.
  2. Drive architectural improvements that enhance reliability, performance, scalability, and operational visibility across development and production environments.
  3. Own and evolve CI/CD pipelines, developer tooling, and platform automation to improve productivity and deployment safety at scale.
  4. Proactively identify reliability, performance, and security risks, and lead efforts to mitigate them.
  5. Design and implement infrastructure patterns that ensure high availability, fault tolerance, and operational resilience.

Skills

Required

  • Infrastructure Engineering
  • Site Reliability Engineering (SRE)
  • DevOps
  • AWS
  • Kubernetes
  • Nomad
  • Terraform
  • Ansible
  • Go
  • Python

Nice to have

  • observability platforms
  • telemetry pipelines
  • monitoring infrastructure
  • internal developer platforms
  • GCP
  • Azure

What the JD emphasized

  • 5+ years of experience in Infrastructure Engineering, Site Reliability Engineering (SRE), DevOps, or related roles.
  • Demonstrated experience designing and operating production systems at scale, with deep ownership of reliability and operational excellence.
  • Strong experience with container orchestration platforms such as Kubernetes or Nomad, including architectural decision-making and operational tuning.
  • Hands-on experience managing cloud infrastructure using Infrastructure-as-Code tools such as Terraform, Ansible, or similar, with a focus on scalable system design.
  • Strong programming skills in Go, Python, or similar languages, with a track record of building automation and infrastructure systems.