Senior Manager, Site Reliability Engineering - Infrastructure Platform

Okta Okta · Enterprise · Bellevue, WA · Tech Ops-610

Okta is seeking a Senior Manager of Infrastructure Platform and Shared Services to lead teams responsible for scaling their identity platform, which is crucial for securing AI. The role involves overseeing Edge networking, K8s platform, CI/CD, Observability, and automation tooling, with a focus on building a world-class observability platform and accelerating engineering velocity through robust platforms and tooling. The position requires extensive experience in Agile, DevOps, cloud-native architectures, Kubernetes, IaC, CI/CD, and observability platforms.

What you'd actually do

  1. Lead the Infra platform and shared services org and various initiatives across SRE & Infrastructure organization.
  2. Lead the DevOps transformation, microservice journey, and next generation Infra platform capabilities in partnership with architects and product engineering
  3. Build a world-class observability platform and monitoring capabilities enabled with self-service
  4. Accelerate the velocity of SRE and product engineering by developing robust platforms, powerful tooling, and intuitive self-service capabilities.
  5. Own the design and operation of scalable, self-service Cloud infrastructure platforms (e.g., Kubernetes, service mesh, CI/CD pipelines, IaC & Edge Infrastructure)

Skills

Required

  • technical leadership
  • people management
  • Agile and DevOps methodologies
  • SaaS/Cloud service infrastructure
  • AWS
  • cloud-native architectures
  • containerization (Kubernetes)
  • IaC (Terraform)
  • CI/CD pipelines
  • SW development
  • PaaS
  • automation
  • observability platforms
  • monitoring tools (Grafana, Splunk, APM etc.)
  • cross-functional teams leadership
  • large-scale programs management
  • communication skills
  • interpersonal skills
  • Computer Science Degree or equivalent experience

Nice to have

  • multi-Cloud environment experience

What the JD emphasized

  • 6+ years of experience in technical leadership & people management
  • 3+ years of experience running large-scale infrastructure platforms supporting a SaaS/Cloud service in a public Cloud, preferably AWS.
  • Strong expertise in cloud-native architectures, containerization (Kubernetes), IaC (Terraform), and CI/CD pipelines
  • Deep experience with building and operating observability platforms and monitoring tools (Grafana, Splunk, APM etc.) in a large scale environment.