Site Reliability Engineer (auth0) at Okta

What you'd actually do

Design and build custom software in Go to enhance the platform's reliability, resiliency, and redundancy.

Partner with engineering teams to embed reliability principles, improving the availability, performance, and observability of our services.

Use your deep understanding of infrastructure and observability principles to identify opportunities for improvement within the product and implement solutions.

Contribute to our on-call rotation, providing rapid, effective response to critical incidents and using your expertise to troubleshoot, mitigate or accurately escalate production issues.

Develop and refine our SRE tooling and processes, focusing on automation and operational efficiency.

Skills

Required

Go programming language
Infrastructure as code (Terraform)
Container orchestration (Kubernetes, Docker)
GitOps (ArgoCD)
Major cloud provider (Azure, AWS, or GCP)
Microservices architecture
Databases (SQL, NoSQL)
Networking fundamentals
SRE principles (SLIs, SLOs, error budgets)
On-call rotation experience
Communication and collaboration skills

Nice to have

custom software development
observability principles
automation
operational efficiency

What the JD emphasized

career-defining work

relentless drive to solve complex challenges

speed and urgency

execute with excellence

mission

exponential growth

directly contributing to the platform's core resiliency and robustness

hands-on builder

high degree of ownership

high degree of autonomy

custom applications, not just scripts

major cloud provider

microservices architecture

networking fundamentals

custom code can solve platform-level issues

core SRE principles

on-call rotation for a 24/7 cloud-based environment

Exceptional communication and collaboration skills

remote, distributed team

self-driven

massive scale

curious and motivated engineer

building reliability directly into the platform

**Secure Every Identity, from AI to Human

**Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.

This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.

Auth0 provides an unparalleled authentication experience for hundreds of millions of users worldwide. Our commitment to reliability is a key foundation of our product and our dedication to exceeding customer availability expectations is a core engineering focus. As a mid-level Site Reliability Engineer, you'll join our SRE team based in Europe to ensure our production systems are not only operational but also resilient, scalable, and ready for exponential growth. This isn't just about keeping the lights on; it's about directly contributing to the platform's core resiliency and robustness. You'll be a hands-on builder, crafting solutions that make our system more reliable by design.

What you’ll do:

Design and build custom software in Go to enhance the platform's reliability, resiliency, and redundancy.
Partner with engineering teams to embed reliability principles, improving the availability, performance, and observability of our services.
Use your deep understanding of infrastructure and observability principles to identify opportunities for improvement within the product and implement solutions.
Contribute to our on-call rotation, providing rapid, effective response to critical incidents and using your expertise to troubleshoot, mitigate or accurately escalate production issues.
Develop and refine our SRE tooling and processes, focusing on automation and operational efficiency.
Define, document, and champion reliability best practices across the organisation.

What you'll need to be successful:

This role requires a unique blend of a software engineer's mindset and operational expertise. You'll thrive in this role if you have:

A proactive and systematic approach to problem-solving, with a high degree of ownership.
Proven experience in a production environment supporting large-scale, mission-critical applications with a high degree of autonomy.
Proficiency in at least one programming language, with a strong preference for Go. You should be comfortable writing custom applications, not just scripts.
Experience with infrastructure as code (Terraform), container orchestration (Kubernetes, Docker) and GitOps (ArgoCD).
Demonstrable expertise in a major cloud provider (Azure, AWS, or GCP).
A strong grasp of microservices architecture, databases (SQL, NoSQL), and networking fundamentals, so you can understand how custom code can solve platform-level issues.
An understanding of core SRE principles, including SLIs, SLOs, and error budgets.
Experience in an on-call rotation for a 24/7 cloud-based environment.
Exceptional communication and collaboration skills, with a proven ability to work effectively in a remote, distributed team, where tasks may be self-driven.

We're looking for someone who is not just looking for a job, but a career-defining opportunity to tackle complex challenges at a massive scale. If you're a curious and motivated engineer who's passionate about building reliability directly into the platform, we'd love to hear from you.

#LI-Remote

P14337_3272114

** The Okta Experience**

Supporting Your Well-Being
Driving Social Impact
Developing Talent and Fostering Connection + Community

We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.

If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation.

Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please click here to view our full NYC AEDT Notice.

Okta is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Personnel and Job Candidate Privacy Notice at https://www.okta.com/legal/personnel-policy/.