Senior Site Reliability Engineer

Oracle Oracle · Enterprise · Reston, VA +1

This role focuses on designing, building, operating, and automating services for traditional IT infrastructure, specifically Oracle Linux systems. Responsibilities include capacity management, incident response, system monitoring, automation development, and troubleshooting to ensure reliability, scalability, and efficiency. The role involves managing distributed Unix-based systems, storage solutions, and implementing auto-scaling and self-healing infrastructure.

What you'd actually do

  1. Design and manage distributed Unix-based systems, particularly Oracle Linux.
  2. Implement auto-scaling and self-healing infrastructure to ensure uptime and durability.
  3. Tune system internals, including kernel parameters, networking, and filesystems, for high performance.
  4. Maintain timely OS patching and compliance posture across environments.
  5. Integrate systems with enterprise identity services such as Active Directory, LDAP, and Kerberos.

Skills

Required

  • Oracle Linux
  • Ansible
  • software development
  • Unix-based systems
  • auto-scaling
  • self-healing infrastructure
  • system internals tuning
  • kernel parameters
  • networking
  • filesystems
  • OS patching
  • compliance posture
  • enterprise identity services
  • Active Directory
  • LDAP
  • Kerberos
  • distributed storage solutions
  • GlusterFS
  • replication strategies
  • geo-replication
  • storage performance monitoring
  • storage scalability
  • Automation
  • Infrastructure as Code

Nice to have

  • site reliability trends
  • business development decisions