Sre Operations Engineer

Okta Okta · Enterprise · Bangalore, India · Tech Ops-610

Okta is seeking an SRE Operations Engineer to ensure the smooth operation of their Customer Identity Cloud, focusing on production system availability and long-term operational success. The role involves executing operational tasks, monitoring platform health, assisting with testing, and acting as an escalation point for platform issues, with potential career growth into Site Reliability Engineering.

What you'd actually do

  1. Executes operational work including updating/patching and maintaining the Engineering Service Desk queue
  2. Responsible for ensuring team requests are triaged and/or actioned in a timely manner
  3. Monitors Platform health and take steps to alleviate issues related to deployment and operations
  4. Assist with capacity, performance and scalability testing where required
  5. Escalation point for Platform issues from customer support teams

Skills

Required

  • General platform infrastructure knowledge, including high availability / load balancing concepts, routers, firewalls and storage subsystems
  • Sound understanding of protocols/technologies like HTTP, SSL, SSH and Kubernetes
  • Familiarity with a variety of open source technologies and tools like MongoDB, NodeJS
  • Experience with monitoring and troubleshooting techniques
  • Ability to communicate clearly with a diverse range of stakeholders across multiple domains
  • Multi tasking and time management skills
  • 1+ years in a Cloud Operations role
  • 1+ years in a production environment supporting large-scale, mission-critical applications

Nice to have

  • Knowledge of terraform is good to have
  • Familiarity with a cloud platforms like AWS and Azure is desired
  • Linux fundamentals and knowledge of tools like Datadog are good to have
  • Interest and/or an understanding of programming e.g. golang, shell scripting

What the JD emphasized

  • ensuring customer availability expectations are exceeded in every way
  • ensuring production systems remain operational at all times
  • potential career growth into Site Reliability Engineering