Staff Site Reliability Engineer

Okta Okta · Enterprise · Bangalore, India · Tech Ops-610

Okta is seeking a Staff Site Reliability Engineer to build and operate highly scalable, reliable, and secure infrastructure powering their production systems across AWS and GCP, with a focus on identity and securing AI. The role involves leading reliability and modernization initiatives, serving as a technical authority in Kubernetes and cloud infrastructure, and partnering with development teams to enable microservice-based applications.

What you'd actually do

  1. Design, build, and operate highly scalable, reliable, and secure infrastructure powering our production systems across AWS and GCP.
  2. Lead major reliability and modernization initiatives, including container platform migrations (e.g., ECS to EKS/GKE) and microservice enablement across multi-cloud environments.
  3. Serve as a technical authority in Kubernetes (EKS and GKE), cloud infrastructure (AWS and GCP), and modern CI/CD practices (GitOps, automation pipelines).
  4. Partner with development teams to architect and enable microservice-based applications, ensuring production readiness, scalability, and observability.
  5. Implement and manage infrastructure as code (Terraform, Ansible) to automate provisioning, scaling, and configuration management across multiple cloud providers.

Skills

Required

  • Kubernetes (EKS and GKE)
  • AWS
  • GCP
  • Terraform
  • Ansible
  • Python
  • Go
  • Shell
  • CI/CD pipelines
  • Linux systems
  • networking fundamentals
  • Redis
  • observability tools
  • container security
  • secrets management

Nice to have

  • SaaS experience
  • high-scale, cloud-native environments

What the JD emphasized

  • Kubernetes (EKS and GKE)
  • AWS and GCP
  • Terraform
  • ECS to EKS/GKE migrations
  • microservice enablement