What you'd actually do

Ensuring high availability, scalability, and disaster recovery across all systems.

Leading architectural discussions and making strategic decisions related to scalability, security, and availability.

Driving continuous improvement of our infrastructure, deployment, and monitoring processes.

Collaborating with development and operations teams to improve deployment processes and infrastructure resiliency.

Acting as a subject-matter expert for the SRE team and cross-functional engineering groups.

Skills

Required

Experience with AWS
Deep understanding of Kubernetes architecture and day-to-day cluster management
Experience with Security Services/ Internet Infrastructure providers, e.g. Cloudflare
Proficiency in alerting and monitoring tools
Proficiency with Infrastructure as Code tools (Terraform, Kustomize and Helm)
Experience with CI/CD pipelines and GitOps practices
Strong scripting and automation skills in Bash and/or Python
Solid knowledge of networking principles
A proactive mindset with the ability to work in a fast-paced environment

Nice to have

familiarity with incident management practices (on-call, runbooks, postmortem, disaster recovery)
Understand Zero Trust security models and security best practices in cloud environments
exposure to Service Mesh (Istio, Linkerd) and container networking
experience with cost optimisation and cloud spend monitoring
Knowledge of managing permission models on distributed systems

Snyk is the leader in secure AI software development, helping millions of developers develop fast and stay secure as AI transforms how software is built. Our AI-native Developer Security Platform integrates seamlessly into development and security workflows, making it easy to find, fix, and prevent vulnerabilities — from code and dependencies to containers and cloud.

Our mission is to empower every developer to innovate securely in the AI era — boosting productivity while reducing business risk. We’re not your average security company - we build Snyk on One Team, Care Deeply, Customer Centric, and Forward Thinking.

It’s how we stay driven, supportive, and always one step ahead as AI reshapes our world.

Why this role?

We are seeking a skilled and proactive Staff Site Reliability Engineer (SRE) to join our team and support our growth of the Snyk API & Web, by building scalable, reliable, and secure cloud infrastructure. You will be responsible for ensuring the performance and uptime of our systems while adopting DevOps best practices and leveraging modern tools.

What You’ll Do:

Ensuring high availability, scalability, and disaster recovery across all systems.
Leading architectural discussions and making strategic decisions related to scalability, security, and availability.
Driving continuous improvement of our infrastructure, deployment, and monitoring processes.
Collaborating with development and operations teams to improve deployment processes and infrastructure resiliency.
Acting as a subject-matter expert for the SRE team and cross-functional engineering groups.
Mentoring and supporting other engineers, helping to grow team skills and practices.
Leading root cause analysis processes and post-incident reviews to ensure learning and resilience improvements.
Spreading the word of reliability, observability, and automation across the organisation

What You Bring:

Experience with AWS (open to other cloud providers)
Deep understanding of Kubernetes architecture and day-to-day cluster management, as well as managing complex Kubernetes environments
Experience with Security Services/ Internet Infrastructure providers, e.g. Cloudflare
Proficiency in alerting and monitoring tools
Proficiency with Infrastructure as Code tools (Terraform, Kustomize and Helm)
Experience with CI/CD pipelines and GitOps practices such as ArgoCD or similar tools
Strong scripting and automation skills in Bash and/or Python.
Solid knowledge of networking principles
A proactive mindset with the ability to work in a fast-paced environment

**It’d Be Awesome If You Also… **

Have familiarity with incident management practices (on-call, runbooks, postmortem, disaster recovery).
Understand Zero Trust security models and security best practices in cloud environments.
Have exposure to Service Mesh (Istio, Linkerd) and container networking.
Have experience with cost optimisation and cloud spend monitoring.
Knowledge of managing permission models on distributed systems

#LI-CR1 #LI-Hybrid

We care deeply about the warm, inclusive environment we’ve created and we value diversity – we welcome applications from those typically underrepresented in tech. If you like the sound of this role but are not totally sure whether you’re the right person, do apply anyway!

About Snyk

Snyk is committed to creating an inclusive and engaging environment where our employees can thrive as we rally behind our common mission to make the digital world a safer place. From Snyk employee resource groups, to global benefits that help our employees prioritize their health, wellness, financial security, and a work/life blend, we aim to support our employees along their entire journeys here at Snyk.

Benefits & Programs

Prioritize health, wellness, financial security, and life balance with programs tailored to your location and role.
Flexible working hours, work-from home allowances, in-office perks, and time off for learning and self development
Generous vacation and wellness time off, country-specific holidays, and 100% paid parental leave for all caregivers
Health benefits, employee assistance plans, and annual wellness allowance
Country-specific life insurance, disability benefits, and retirement/pension programs, plus mobile phone and education allowances