Manager Site Reliability Engineer

Okta Okta · Enterprise · Bangalore, India · SW Eng - Infrastructure-672

Manager Site Reliability Engineer at Okta, focusing on building and scaling internal developer platforms, enhancing automation for fleet management, and ensuring reliability and performance of production systems. The role involves leading a team of SWEs & SREs, triaging production issues, and partnering with stakeholders to align capabilities with reliability, security, and delivery velocity. Emphasis on a Platform-as-Product mindset, engineering excellence in tooling, and an automation-first approach.

What you'd actually do

  1. Leading the architecture, design and rollout of the internal developer platform which would span CI/CD, tooling, infrastructure as code (IAC) integrations as well as modernization efforts
  2. Enhance existing automation for fleet management and build new features as per the needs of the infrastructure platform
  3. Spearhead initiatives and projects that enhance engineer productivity by identifying and mitigating bottlenecks in the development flow
  4. Mentoring, managing, and leading a team of SWEs & SREs with a broad range of expertise and experience
  5. Triaging and troubleshooting complex production issues to ensure reliability and performance.

Skills

Required

  • Experience managing teams running large-scale production Java/Tomcat and containerized services in AWS (EC2, ECS, KMS, Kinesis, RDS) or other cloud providers.
  • Deep knowledge of CI/CD principles, Linux fundamentals, OS hardening, networking concepts, and IP protocols.
  • Leadership, communication, and project management skills.
  • Security background and knowledge.

Nice to have

  • Experience building and scaling internal developer platforms (IDP)
  • Experience building robust, automated tooling for CI/CD orchestration, Kubernetes operators, or self-service infrastructure provisioning (IaC)
  • History of transitioning teams from manual "toil-heavy" operations to automated, code-driven workflows
  • Experience in a cloud native environment.

What the JD emphasized

  • Platform-as-Product Mindset
  • Engineering Excellence in Tooling
  • Automation First
  • 3+ years of experience managing SWE or SRE teams, ideally in a cloud native environment.
  • Strong security background and knowledge.