Senior Production Engineer

Snowflake Snowflake · Data AI · CA-Menlo Park, United States · Engineering

This role is for a Senior Production Engineer at Snowflake, focusing on driving the reliability, tools, and processes for their services. The responsibilities include managing the full lifecycle of services, scaling systems through automation, incident response, collaborating with software engineers on SLOs, and participating in on-call rotations. The ideal candidate has a CS degree, proficiency in a modern programming language, and systematic problem-solving skills. While the company emphasizes an 'agentic enterprise' and AI as a collaborator, the core responsibilities of this role are in traditional production engineering and reliability, not direct AI/ML model development or deployment.

What you'd actually do

  1. Engage in and improve the whole lifecycle of services—from inception and design, deployment, operation, and refinement.
  2. Scale systems sustainably by automation; Drive changes that improve reliability and velocity.
  3. Establish and practice low noise incident response rotations and blameless postmortems to prevent problem recurrence.
  4. Write and review code. Develop documentation and capacity plans, and debug the hardest problems on large distributed systems.
  5. Collaborate with software engineers to establish, maintain, and optimize functional and performance SLOs.

Skills

Required

  • Bachelor's degree in Computer Science, a related technical field involving software engineering, or equivalent practical experience.
  • Proficient in at least one modern programming language
  • Systematic problem-solving methods
  • effective communication skills.

Nice to have

  • Experience with capacity and load testing of the distributed applications
  • Experience with containers and container orchestration systems such as Kubernetes
  • Experience in deploying, managing, and operating scalable and fault tolerant Linux infrastructure.
  • Experience with the SLO-driven reliability management processes.
  • Hands on experience with one of more public cloud providers (AWS, Azure, or GCP)
  • Ability to prioritize tasks and work independently.

What the JD emphasized

  • U.S. export-controlled technologies
  • U.S. export-controlled technologies
  • U.S. export-controlled technologies