Manager, Site Reliability Engineering

Okta Okta · Enterprise · San Francisco, CA · SW Eng - Infrastructure-672

Manager of Site Reliability Engineering at Okta, focusing on scaling and reliability of the IDaaS platform, which supports AI initiatives. The role involves managing SRE teams, driving DevOps maturity, developing tooling, and ensuring high availability and cost-effectiveness of infrastructure, particularly in cloud-native environments (AWS, Kubernetes, Terraform, CI/CD, Observability).

What you'd actually do

  1. Managing a team of SRE’s supporting various workloads and teams that support our IDaaS platform.
  2. Drive the microservice journey, DevOps maturity, and workload reliability in tandem with architects and teams across the organization.
  3. Accelerate the velocity of SRE and product engineering by developing powerful tooling, intuitive self-service capabilities, and robust self-healing patterns.
  4. Lead, mentor, and grow a high-performing team of engineers and managers across platform, infrastructure, and shared services domains.
  5. Perform engineering design evaluations and ensure the completion of projects within resource, budget, and scheduling constraints.

Skills

Required

  • 3+ years of experience in technical leadership & people management
  • Extensive experience using Agile and DevOps methodologies to build product infrastructure and shared service at scale
  • Experience running large-scale infrastructure platforms supporting a SaaS/Cloud service in a public Cloud, preferably AWS.
  • Strong expertise in cloud-native architectures, containerization (Kubernetes), IaC (Terraform), and CI/CD pipelines
  • Strong background and hands-on experience in SW development, PaaS and automation
  • Deep experience with building and operating observability platforms and monitoring tools (Grafana, Splunk, APM etc.) in a large scale environment.
  • Effective verbal, written communication and interpersonal skills
  • Computer Science Degree or related degree or equivalent experience

Nice to have

  • Experience supporting a multi-Cloud environment will be a plus.

What the JD emphasized

  • requires 2 days a week in our San Francisco Office
  • extensive experience using Agile and DevOps methodologies to build product infrastructure and shared service at scale
  • Strong expertise in cloud-native architectures, containerization (Kubernetes), IaC (Terraform), and CI/CD pipelines
  • Deep experience with building and operating observability platforms and monitoring tools (Grafana, Splunk, APM etc.) in a large scale environment.
  • Computer Science Degree or related degree or equivalent experience