Senior Manager, Site Reliability Engineering

Oracle Oracle · Enterprise · Reston, VA +1

Senior Manager for Site Reliability Engineering (SRE) at Oracle, focusing on infrastructure, capacity management, incident response, and automation to ensure the reliability and scalability of services. The role involves leading teams, providing technical guidance, managing projects, and fostering collaboration to optimize operational efficiency and performance.

What you'd actually do

  1. Supports team members designing and architecting infrastructure and/or service, sharing guidance on practices and terms for reliability and functionality.
  2. Supervises team members and provides direction to ensure accurate forecasting of demands for infrastructure and response to capacity needs, ensuring systems have sufficient resources to handle current and future workloads and identifying resource gaps.
  3. Monitors data collection, triage, technical analysis, and redirection, ensuring team members maintain and optimize operations and infrastructure reliability.
  4. Implements standards for identifying and recommending opportunities for automation and assesses potential benefits to enhance operational efficiency.
  5. Serves as a senior management escalation point for incidents and complex issues arising within Oracle services.

Skills

Required

  • Site Reliability Engineering (SRE)
  • Infrastructure architecture
  • Capacity planning
  • Incident management
  • Root cause analysis
  • Automation
  • Performance monitoring
  • Scalability
  • Team leadership
  • Project management
  • Technical communication

Nice to have

  • Cloud infrastructure
  • DevOps practices
  • Prototyping
  • Release management
  • Service Level Objectives (SLOs)
  • Service Level Agreements (SLAs)