Staff Site Reliability Engineer (sre), Agile

Okta Okta · Enterprise · San Francisco, CA · Tech Ops-610

Okta is seeking an experienced Staff Site Reliability Engineer (SRE) to join their Infrastructure Platform AGILE SRE team. The role focuses on providing cross-functional support, building critical infrastructure, and strengthening internal tooling and operational capabilities. Responsibilities include investigating and resolving infrastructure issues, providing technical guidance, contributing to documentation, mentoring junior team members, and improving monitoring, alerting, and incident response processes. Required qualifications include 7+ years of SRE experience, proficiency with Kubernetes, strong Linux/Unix administration, understanding of CI/CD, networking concepts, and infrastructure as code.

What you'd actually do

  1. Investigate and resolve infrastructure issues reported by internal teams
  2. Provide technical guidance and support across multiple technical domains
  3. Contribute to runbooks, documentation, and knowledge sharing
  4. Mentor junior team members on SRE best practices and troubleshooting methodologies
  5. Identify and implement improvements to monitoring, alerting, and incident response processes

Skills

Required

  • 7+ years of Site Reliability Engineering or equivalent systems administration experience
  • Proficiency with Kubernetes and container orchestration
  • Strong Linux/Unix systems administration background
  • Good understanding of CI/CD and deployment strategies
  • Good grasp of networking concepts
  • Experience with infrastructure as code, infrastructure troubleshooting and general architecture
  • Excellent communication and documentation skills

Nice to have

  • Kubernetes
  • Terraform
  • Golang
  • Python
  • Experience working across multiple teams in a cross-functional capacity
  • Familiarity with compliance and change management processes