Staff Site Reliability Engineer, Kubernetes W/ Active Ts/sci

Okta Okta · Enterprise · Washington, DC · Tech Ops-610

Okta is seeking a Staff Site Reliability Engineer to manage and ensure the reliability and performance of their cloud production infrastructure, with a focus on Kubernetes. The role requires experience with large-scale cloud systems, automation, incident management, and supporting national security missions, necessitating an active TS/SCI clearance and familiarity with FedRAMP and DoD IL6 compliance.

What you'd actually do

  1. Design, deploy, and monitor Okta’s production infrastructure to ensure peak performance and reliability.
  2. Serve as a frontline responder to production incidents, performing deep-dive troubleshooting and implementing permanent preventive solutions.
  3. Eliminate manual toil by developing automation scripts, evolving monitoring tools, and documenting technical workflows.
  4. Support a highly available, large-scale environment as part of an on-call rotation, ensuring "Always On" service delivery.

Skills

Required

  • Active TS/SCI clearance
  • FedRAMP
  • DoD IL6
  • Kubernetes
  • Go
  • Python
  • Bash
  • Ruby
  • AWS
  • Docker
  • Linux systems administration
  • Helm
  • Terraform
  • CloudFormation

Nice to have

  • multi-cloud environments

What the JD emphasized

  • Active TS/SCI clearance
  • FedRAMP
  • DoD IL6
  • Kubernetes