Principal Site Reliability Engineer

Okta Okta · Enterprise · Bangalore, India · Tech Ops-610

Principal Site Reliability Engineer to lead reliability engineering within Okta's Emerging Products Group (EPG). This role involves defining technical strategy, influencing platform architecture, establishing reliability standards, and leading initiatives to improve scalability, resilience, security, and operational excellence. The role will also explore and adopt AI-assisted reliability engineering practices and design agentic systems for operational tasks.

What you'd actually do

  1. Define and drive the reliability strategy for critical product and platform services.
  2. Own reliability architecture and operational excellence for the Spera / ISPM product area.
  3. Design, build, and operate large-scale cloud infrastructure and production services.
  4. Mentor Staff and Senior engineers across multiple teams and organizations.
  5. Lead the exploration and adoption of AI-assisted reliability engineering practices across EPG.

Skills

Required

  • Site Reliability Engineering
  • Technical leadership
  • Cloud infrastructure
  • Kubernetes
  • Terraform
  • Go
  • Python
  • Observability
  • Incident management
  • Scalability
  • Resilience
  • Security
  • Automation
  • Infrastructure-as-Code
  • GitOps

Nice to have

  • AI-assisted reliability engineering
  • Agentic systems

What the JD emphasized

  • partner closely with the Spera / Identity Security Posture Management (ISPM) engineering organization
  • partner closely with engineering leadership, product leadership, architects, and Staff engineers
  • partner with engineering leaders
  • partner with platform and product engineering teams
  • partner with platform and product engineering teams