Software Engineer - Production Engineering

Snowflake Snowflake · Data AI · Warsaw, Poland · Engineering

This role is for a Software Engineer on the Production Engineering Team at Snowflake, focusing on driving reliability tools and processes for their services. The team champions SLOs, builds infrastructure for issue detection, and verifies system health. They emphasize proactive issue prevention, rapid detection and diagnosis, efficient resolution, and learning from incidents. The role involves improving the full lifecycle of services, scaling systems through automation, establishing incident response and blameless postmortems, writing code, developing documentation and capacity plans, debugging distributed systems, and collaborating on SLOs. The team operates on a 12x7 on-call rotation. Minimal qualifications include a CS degree or equivalent, proficiency in a modern language (preferably Golang), and systematic problem-solving skills. Preferred qualifications include experience with large-scale systems, observability tools, Kubernetes, Linux infrastructure, and public cloud providers.

What you'd actually do

  1. Improve the whole lifecycle of services—from inception and design, deployment, operation, and refinement.
  2. Scale systems sustainably by automation; Participate in changes that improve reliability and velocity.
  3. Establish and practice low noise incident response rotations and blameless postmortems to prevent problem recurrence.
  4. Write and review code. Develop documentation and capacity plans, and debug the hardest problems on large distributed systems.
  5. Collaborate with software engineers to establish, maintain, and optimize functional and performance SLOs.

Skills

Required

  • Bachelor's degree in Computer Science, a related technical field involving software engineering, or equivalent practical experience.
  • Proficient in at least one modern programming language, preferably Golang.
  • Systematic problem-solving methods, effective communication skills.

Nice to have

  • 3+ years industry experience of building and supporting large scale systems in production.
  • Experience in modern observability tools and production monitoring practices.
  • Experience with containers and container orchestration systems such as Kubernetes
  • Experience in deploying, managing, and operating scalable and fault tolerant Linux infrastructure.
  • Hands-on experience with one of more public cloud providers (AWS, Azure, or GCP)