What you'd actually do

Partner with senior technical leaders across Twilio to set and communicate the reliability strategy, translating business goals into measurable outcomes.

Influence company-wide architectural decisions while balancing long-term vision with near-term and compliance needs.

Lead the design, implementation, and operation of scalable solutions and paved roads that enable reliable, high-traffic services;

Influence company-wide architectural decisions to focus on availability, performance, resilience, and cost efficiency using Kubernetes, AWS, Terraform, and modern observability.

Ensure integrity and quality across the service lifecycle; design fault-tolerant architectures, incident response, disaster recovery, and capacity/cost management.

Skills

Required

Reliability Engineering
Software Engineering
DevOps
infrastructure
backend systems
reliability
principal/architect experience
strategic technical decisions
long-term technical vision
SaaS organization reliability
cross-org technical architecture
cloud architecture
devops practices
large-scale systems design
microservices
production experience
operational management
scaling
partitioning strategies
performance tuning
reliability tuning
high-scale environments
Kubernetes (e.g., EKS)
deploying and managing stateful services
AWS services
infrastructure-as-code tools (Terraform or CloudFormation)
observability tools (Prometheus, Grafana, Datadog)
monitoring distributed systems
setting up alerting
programming language (Go, Python, Java)
building automation and tooling
incident response processes
SLOs/SLIs
runbooks
on-call rotations
cross-functional post-incident reviews
distributed systems principles
consensus
durability
throughput
availability tradeoffs
leading reliability improvements
data-intensive systems
mission-critical systems
collaborating with engineering teams
problem-solving
analytical skills
verbal communication
written communication
cross-functional environments
distributed environments
mentoring teams
influencing decisions
balancing long-term objectives with short-term needs
building effective working relationships

Nice to have

owning and operating large AWS footprints
Kubernetes architecture and concepts
data technologies (Apache Kafka, AWS MSK)
reliable streaming
building reliable products
high-availability systems

What the JD emphasized

15+ years of experience in Reliability Engineering, Software Engineering, DevOps roles with a focus on infrastructure, backend systems, and reliability, including as a principal/architect.

Strong production experience, including operational management, scaling, partitioning strategies, and tuning for performance and reliability in high-scale environments.

Proven track record of leading reliability improvements in data-intensive or mission-critical systems and collaborating with engineering teams.

**Who we are **

At Twilio, we’re shaping the future of communications, all from the comfort of our homes. We deliver innovative solutions tohundreds of thousands of businessesand empower millions of developers worldwide to craft personalized customer experiences.

Our dedication to remote-first work, and strong culture of connection and global inclusion means that no matter your location, you’re part of a vibrant team with diverse experiences making a global impact each day. As we continue to revolutionize how the world interacts, we’re acquiring new skills and experiences that make work feel truly rewarding. Your career at Twilio is in your hands.

We use Artificial Intelligence (AI) to help make our hiring process efficient. That said, every hiring decision is made by real Twilions!

See yourself at Twilio

Join the team as Twilio’s next Reliability Architect.

About the job

As an Architect in SRE, you will drive the technical strategy, vision and outcomes for Twilio’s Reliability Engineering organization. You will define and lead solutions and initiatives that ensure Twilio products are reliable worldwide, and you will define standards and guide engineering teams on best practices for designing, building, and operating resilient systems. This role is pivotal to Twilio’s commitment to operational excellence, scalability, and pragmatic, large-scale systems design in the cloud.

Responsibilities

In this role, you’ll:

Partner with senior technical leaders across Twilio to set and communicate the reliability strategy, translating business goals into measurable outcomes.
Influence company-wide architectural decisions while balancing long-term vision with near-term and compliance needs.
Lead the design, implementation, and operation of scalable solutions and paved roads that enable reliable, high-traffic services;
Influence company-wide architectural decisions to focus on availability, performance, resilience, and cost efficiency using Kubernetes, AWS, Terraform, and modern observability.
Ensure integrity and quality across the service lifecycle; design fault-tolerant architectures, incident response, disaster recovery, and capacity/cost management.
Collaborate with product and cross-functional teams to identify reliability risks and convert them into actionable designs, programs, and tooling.
Establish and champion reliability practices and drive systemic improvements.
Mentor and grow engineers and technical leaders
Track and apply emerging SRE, cloud, and large-scale systems best practices; introduce pragmatic innovations that improve reliability at scale.

**Qualifications **

Twilio values diverse experiences from all kinds of industries, and we encourage everyone who meets the required qualifications to apply. If your career is just starting or hasn't followed a traditional path, don't let that stop you from considering Twilio. We are always looking for people who will bring something new to the table!

*Required:

15+ years of experience in Reliability Engineering, Software Engineering, DevOps roles with a focus on infrastructure, backend systems, and reliability, including as a principal/architect.
Strong experience in driving strategic technical decisions and defining long-term technical vision.
In-depth understanding of the role of Reliability Engineering in a large and diverse SaaS organization.
Experience driving cross-org technical architecture outcomes.
Knowledge of cloud architecture, devops practices, and large-scale systems design with microservices.
Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent experience).
Strong production experience, including operational management, scaling, partitioning strategies, and tuning for performance and reliability in high-scale environments.
Hands-on experience with Kubernetes (e.g., EKS), deploying and managing stateful services, and cloud services like AWS.
Proficiency in infrastructure-as-code tools such as Terraform or CloudFormation for automating infrastructure.
Expertise in observability tools (e.g., Prometheus, Grafana, Datadog) for monitoring distributed systems and setting up alerting.
Proficient in at least one programming language (e.g., Go, Python, Java) for building automation and tooling.
Experience designing incident response processes, SLOs/SLIs, runbooks, and participating in on-call rotations.
Experience running cross-functional post-incident reviews and driving improvements.
Strong understanding of distributed systems principles, including consensus, durability, throughput, and availability tradeoffs.
Proven track record of leading reliability improvements in data-intensive or mission-critical systems and collaborating with engineering teams.
Excellent problem-solving, analytical, verbal, and written communication skills, with the ability to work in cross-functional and distributed environments.
Demonstrated leadership in mentoring teams, influencing decisions, and balancing long-term objectives with short-term needs.
Ability to influence and build effective working relationships with all levels of the organization.

Desired:

Specific experience owning and operating large AWS footprints.
Knowledge of Kubernetes architecture and concepts.
Experience with data technologies like Apache Kafka, AWS MSK, or similar for reliable streaming.
Passion for building reliable products, with prior projects in high-availability systems

Location

This role will be remote, and based in the East Coast, USA; or remote in Ireland, UK or Spain.

**Travel **

We prioritize connection and opportunities to build relationships with our customers and each other. For this role, you may be required to travel occasionally to participate in project or team in-person meetings.

What We Offer

Working at Twilio offers many benefits, including competitive pay, generous time off, ample parental and wellness leave, healthcare, a retirement savings program, and much more. Offerings vary by location.

Compensation

*Please note the salary range information provided applies only to candidates residing in California, Colorado, Hawaii, Illinois, Maryland, Massachusetts, Minnesota, New Jersey, New York, Vermont, Washington D.C., and Washington State due to local requirements. Compensation for candidates in other locations will be discussed during the hiring process. Please note that hiring for this role is not restricted to the locations listed above.

The estimated pay ranges for this role are as follows:

Based in Colorado, Hawaii, Illinois, Maryland, Massachusetts, Minnesota, Vermont or Washington D.C. : $227,840.00 (Developing Minimum for Tier 3) - $284,800.00 (Mid for Tier 3).
Based in New York, New Jersey, Washington State, or California (outside of the San Francisco Bay area): $241,200.00 (Developing Minimum for Tier 2) - $301,500.00 (Mid for Tier 2).
Based in the San Francisco Bay area, California: $268,000.00 (Developing Minimum for Tier 1) - $335,000.00 (Mid for Tier 1).
This role may be eligible to participate in Twilio’s equity plan and corporate bonus plan. All roles are generally eligible for the following benefits: health care insurance, 401(k) retirement account, paid sick time, paid personal time off, paid parental leave.

The successful candidate’s starting salary will be determined based on permissible, non-discriminatory factors such as skills, experience, and geographic location.

Twilio thinks big. Do you?

We like to solve problems, take initiative, pitch in when needed, and are always up for trying new things. That's why we seek out colleagues who embody our values — something we call Twilio Magic. Additionally, we empower employees to build positive change in their communities by supporting their volunteering and donation efforts.

So, if you're ready to unleash your full potential, do your best work, and be the best version of yourself, apply now! If this role isn't what you're looking for, please consider other open positions.

Twilio is proud to be an equal opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Qualified applicants with arrest or conviction records will be considered for employment in accordance with the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Additionally, Twilio participates in the E-Verify program in certain locations, as required by law.