At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. If you’re a close but not exact match with the description, we hope you’ll still consider applying. Want to learn more about life at Klaviyo? Visit klaviyo.com/careers to see how we empower creators to own their own destiny.

Lead Software Engineer, Reliability (Dublin)

Team Overview

As a Lead Software Engineer, Reliability, you will set technical direction and lead reliability strategy for Klaviyo’s most critical platforms. You’ll ensure our systems are reliable, scalable, and sustainable while enabling rapid product development across the company.

We treat reliability as a core product feature. Our work spans security, infrastructure, and software engineering, requiring deep systems thinking and strong technical leadership. We build foundational services that must be extremely reliable, secure, and performant at global scale.

The SRE team’s charter is to design, build, and operate foundational infrastructure and services, define reliability standards, reduce operational toil through automation, and continuously improve systems based on production learnings. As a lead, your work will be highly visible and will directly influence how Klaviyo builds software and how customers experience our platform every day.

How you’ll make an impact

As a Lead Software Engineer, Reliability, you will provide technical leadership while remaining hands-on with the systems that underpin Klaviyo’s reliability and operational excellence. You will:

Set the technical vision and long-term strategy for reliability, availability, and operational excellence across critical platforms
Lead the design, implementation, and evolution of foundational, security-critical services with strong guarantees around availability, scalability, latency, and fault tolerance
Drive adoption of SRE best practices across engineering teams, including SLIs, SLOs, error budgets, and reliability-based decision making
Identify systemic reliability risks and architectural bottlenecks, and lead cross-team initiatives to address them with durable, preventative solutions
Apply software engineering principles to automate infrastructure, eliminate operational toil, and improve system reliability at scale
Own and continuously improve observability, alerting, and incident response practices to reduce mean time to detection and recovery
Guide on-call strategy and operational processes to ensure sustainability, automation, and healthy operational load
Perform and lead quantitative analysis around system behavior, capacity planning, scaling limits, and performance characteristics
Partner closely with product, platform, and security leaders to influence system architecture early and ensure reliability is built in from the start
Lead incident response for high-severity events, driving effective mitigation, communication, and follow-up
Mentor senior and mid-level engineers, raising the bar for technical quality, operational maturity, and reliability culture across the organization
Review and influence technical designs, platform APIs, operational runbooks, and system documentation at an organizational level
You’ve already experimented with AI in work or personal projects, and you’re excited to dive in and learn fast. You’re hungry to responsibly explore new AI tools and workflows, finding ways to make your work smarter and more efficient.

Who you are

You are a senior technical leader who combines deep systems expertise with strong judgment and influence. You:

Are a cloud-native, platform-focused SRE who uses software to design and operate highly reliable production systems at scale
Write and maintain production-quality code (e.g. Python, Go, or similar) to build internal platforms, automate operations, and improve system reliability
Have led the design and operation of distributed, cloud-native systems and deeply understand failure modes such as partial outages, dependency failures, resource saturation, and cascading impact
Have extensive experience operating containerized workloads and platforms (e.g. Kubernetes) in production, including deployment strategies, scaling behavior, and service networking
Are comfortable owning on-call strategy and participating in escalation for complex production incidents
Have designed and evolved observability platforms and alerting strategies that reflect real customer and service impact
Apply SRE concepts such as SLIs, SLOs, error budgets, and burn-rate–based alerting to guide engineering priorities and trade-offs at a team or org level
Have strong hands-on experience with infrastructure as code and declarative configuration (e.g. Terraform, Kubernetes manifests, policy-as-code)
Have led capacity planning, load testing, and performance analysis efforts for large-scale distributed systems
Drive high-quality post-incident reviews and ensure concrete, code-focused follow-up actions are delivered and sustained
Are comfortable leading technical discussions, influencing architecture, and providing clear guidance across multiple teams

Nice to have

Experience leading or supporting critical platforms or internal tooling
Familiarity with identity, access management, secrets management, or policy enforcement systems
Experience operating systems at scale in cloud environments (AWS preferred)
Background in resilience testing, fault injection, or chaos engineering
Strong understanding of algorithms and data structures as they apply to large-scale systems

Tech Stack

Klaviyo’s platform is primarily built with Python and React and runs on AWS. Engineers join us from a wide range of technical backgrounds and are supported in learning our stack.

Core technologies include:

Python / Django / FastAPI
MySQL / Redis / Memcached
RabbitMQ / Celery / Apache Kafka / Apache Pulsar
AWS / Terraform / Kubernetes

Location & Work Model

This role is based in Dublin, Ireland and follows a hybrid working model. Klaviyo supports work authorization and relocation for this position.

At Klaviyo, we enjoy tackling meaningful engineering challenges and value people who take ownership, learn continuously, and collaborate openly. We are committed to building inclusive teams and encourage applications from candidates of all backgrounds.

We use Covey as part of our hiring and / or promotional process. For jobs or candidates in NYC, certain features may qualify it as an AEDT. As part of the evaluation process we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound on April 3, 2025.

Please see the independent bias audit report covering our use of Covey here

Our salary range reflects the cost of labour in the country where the job post is advertised. The base salary offered for this position is determined by several factors, including the applicant’s job-related skills, relevant experience, education or training, and work location.

In addition to base salary, our total compensation package may include participation in the company’s annual cash bonus plan, variable compensation (OTE) for sales and customer success roles, equity, sign-on payments, and a comprehensive range of health, welfare, and wellbeing benefits based on eligibility.

Your recruiter can provide more details about the specific salary/OTE range for your preferred location during the hiring process.

Base Pay Range in Local Currency:

€112.000—€168.000 EUR

This role may require up to 10% travel for purposes such as new hire onboarding, client or partner work if applicable, team meetings, and industry events. Travel is coordinated in advance.

Get to Know Klaviyo

We’re Klaviyo (pronounced clay-vee-oh). We empower creators to own their destiny by making first-party data accessible and actionable like never before. We see limitless potential for the technology we’re developing to nurture personalized experiences in ecommerce and beyond. To reach our goals, we need our own crew of remarkable creators—ambitious and collaborative teammates who stay focused on our north star: delighting our customers. If you’re ready to do the best work of your career, where you’ll be welcomed as your whole self from day one and supported with generous benefits, we hope you’ll join us.

_AI fluency at Klaviyo includes responsible use of AI (including privacy, security, bias awareness, and human-in-the-loop). We provide accommodations as needed. _

_By participating in Klaviyo’s interview process, you acknowledge that you have read, understood, and will adhere to our Guidelines for using AI in the Klaviyo interview Process. For more information about how we process your personal data, see our Job Applicant Privacy Notice. _

Klaviyo is committed to a policy of equal opportunity and non-discrimination. We do not discriminate on the basis of race, ethnicity, citizenship, national origin, color, religion or religious creed, age, sex (including pregnancy), gender identity, sexual orientation, physical or mental disability, veteran or active military status, marital status, criminal record, genetics, retaliation, sexual harassment or any other characteristic protected by applicable law.

IMPORTANT NOTICE: Our company takes the security and privacy of job applicants very seriously. We will never ask for payment, bank details, or personal financial information as part of the application process. All our legitimate job postings can be found on our official career site. Please be cautious of job offers that come from non-company email addresses (@klaviyo.com), instant messaging platforms, or unsolicited calls.

By clicking "Submit Application" you consent to Klaviyo processing your Personal Data in accordance with our Job Applicant Privacy Notice. If you do not wish for Klaviyo to process your Personal Data, please do not submit an application._ _You can find our Job Applicant Privacy Notice here and here (FR).