Director, Site Reliability Engineering

Oracle Oracle · Enterprise · Seattle, WA +1

Director of Site Reliability Engineering at Oracle, responsible for leading teams in designing, architecting, and maintaining reliable and scalable infrastructure and services. The role focuses on capacity management, incident response, automation, and technical communication to ensure optimal performance and adherence to service level objectives.

What you'd actually do

  1. Provides leadership for one or more teams designing and architecting infrastructure and/or service, providing input on the development of best practices for adhering to terms for reliability and functionality.
  2. Establishes direction for other managers and senior-level individuals to drive the forecasting of demands for infrastructure and respond to capacity needs, ensuring that systems have sufficient resources to meet current and future workloads and identifying and addressing resource gaps.
  3. Builds collaborative relationships with senior software development team members to design and develop infrastructures that are highly reliable and scalable, meeting stringent deployment requirements.
  4. Ensures teams align on expectations for identifying opportunities for prototyping and oversees prototyping initiatives (e.g., testing new applications or infrastructures, assisting in onboarding), experimenting with cutting-edge approaches.
  5. Ensures alignment across teams regarding performing data collection, triage, technical analysis, and redirection, contributing to the development of standards to maintain and optimize operations and infrastructure reliability.

Skills

Required

  • infrastructure design
  • service architecture
  • reliability engineering
  • scalability
  • capacity planning
  • incident response
  • root cause analysis
  • automation
  • technical communication
  • team leadership
  • mentoring

Nice to have

  • cloud infrastructure
  • DevOps practices
  • performance tuning
  • security standards