Senior Site Reliability Engineer

Oracle Oracle · Enterprise · Pleasanton, CA +1

Senior Site Reliability Engineer to support mission-critical cloud services and production operations. Focuses on improving service reliability, reducing operational risk, automating tasks, and faster issue detection/resolution. Works with development, infrastructure, security, and operations teams on monitoring, troubleshooting, incident response, observability, and reliability best practices. Responsibilities include analyzing failures, building automation, supporting deployments, capacity planning, disaster recovery, and operational readiness.

What you'd actually do

  1. Takes proactive steps to design and architect infrastructure and/or service according to terms for reliability and functionality.
  2. Forecasts demands for infrastructure and responds to capacity needs, ensuring systems have sufficient resources to handle current and future workloads.
  3. Collaborates with the software development team to develop infrastructures and features that are reliable and scalable according to deployment requirements.
  4. Performs data collection, triage, technical analysis, and redirection to maintain and optimize operations and infrastructure reliability.
  5. Independently monitors services, maintains up-to-date knowledge of their performance, and documents their condition.

Skills

Required

  • Site Reliability Engineering
  • Cloud Services
  • Production Operations
  • Service Reliability
  • Operational Risk Management
  • Automation
  • Incident Response
  • Observability
  • Reliability Best Practices
  • Troubleshooting
  • Capacity Planning
  • Disaster Recovery
  • Operational Readiness
  • Infrastructure Design
  • Service Architecture
  • Performance Monitoring
  • Data Collection
  • Technical Analysis
  • Provisioning
  • Decommissioning
  • Scripting
  • Testing
  • Technical Communication
  • Root Cause Analysis
  • Post-mortem Procedures
  • Performance Bottleneck Analysis
  • Deployment Optimization
  • Resource Usage Optimization
  • Scalability
  • Trend Analysis
  • Business Development Support
  • Workload Management
  • On-call Support

Nice to have

  • Experience with new tools and technologies
  • Knowledge of site reliability trends