Principal Site Reliability Engineer

Oracle Oracle · Enterprise · Nashville, TN +1

Principal Site Reliability Engineer at Oracle, focusing on designing, architecting, and managing infrastructure and services for reliability and functionality. Responsibilities include capacity planning, incident response, automation, technical communication, and continuous improvement of operational efficiency and performance.

What you'd actually do

  1. Designs and architects infrastructure and/or service according to terms for reliability and functionality.
  2. Forecasts demands for infrastructure and responds to capacity needs, ensuring systems have sufficient resources to handle current and future workloads and identifying resource gaps.
  3. Collaborates with the software development team to develop infrastructures, ensuring features are reliable and scalable according to deployment requirements.
  4. Exercises judgment when performing data collection, triage, technical analysis, and redirection to maintain and optimize operations and infrastructure reliability.
  5. Identifies and recommends opportunities for automation and assesses potential benefits to enhance operational efficiency.

Skills

Required

  • site reliability engineering
  • infrastructure design
  • capacity planning
  • incident response
  • root cause analysis
  • automation development
  • performance monitoring
  • technical communication
  • troubleshooting
  • experimentation with new tools

Nice to have

  • on-call support
  • SLO management
  • security standards adherence
  • business development decision support