Staff Site Reliability Engineer, Tcore (fedramp)

Okta Okta · Enterprise · San Francisco, CA · SW Eng - Infrastructure-672

Okta is seeking a Staff Site Reliability Engineer to join their TCore team, focusing on the reliability, performance, and security of Okta's core infrastructure, particularly its global traffic entry points and internal networking. The role involves designing and implementing scalable network solutions, maintaining cloud infrastructure, analyzing network data, automating infrastructure with tools like Terraform, and improving system efficiency. The position requires 8+ years of experience in a Cloud Network Engineer role, deep understanding of TCP/IP, AWS/GCP networking, and automation tools, with proficiency in scripting languages.

What you'd actually do

  1. Work with various teams to design and implement scalable, and reliable network solutions
  2. Maintain a highly available cloud infrastructure edge for the Okta identity platform
  3. Collect and analyze data to identify root causes for network-specific events
  4. Automate AWS infrastructure with Terraform and/or Chef
  5. Evolve the system by introducing changes to improve efficiency, scalability, and velocity

Skills

Required

  • 8+ years experience in a Cloud Network Engineer role or related
  • Demonstrated in-depth understanding of TCP/IP networking stack; (layer 2 through 7). Ability to implement a highly available VPC network, including inter-vpc connectivity. Working knowledge of stateless and stateful firewalls. Familiar with DNS, web-application firewalls, and various load balancing methods available in the cloud.
  • Deep knowledge of AWS/GCP network concepts such as Transit Gateway / Network Connectivity Center (NCC), Site-to-Site VPN / HA VPN, and Direct Connect / Cloud Interconnect
  • Ability to troubleshoot network issues using AWS VPC flow logs and Cloudwatch metrics, as well as GCP VPC Flow Logs / Cloud Logging, alongside standard packet captures.
  • Experience working with Terraform, Ansible, Chef, Puppet or similar automation tools
  • Proficiency in Bash, Python, Golang, or similar. Experienced with git
  • Able to collaborate effectively with multiple stakeholders
  • Willingness to work on-call

Nice to have

  • Experience working in a security-oriented cloud environment
  • Working knowledge of Palo Alto next-gen virtual firewalls, implementation of firewalls, as well as configuration of security policies, routing, and Global Protect.
  • Experience with GCP-specific advanced architecture like Shared VPC topologies, Cloud Router BGP configurations, and Network Connectivity Center (NCC).

What the JD emphasized

  • This position requires the ability to access federal environments and/or have access to protected federal data.