Principal Site Reliability Engineer

Oracle Oracle · Enterprise · Seattle, WA +1

Principal Site Reliability Engineer responsible for designing, architecting, and maintaining reliable and scalable infrastructure and services. Focuses on capacity management, incident response, automation, and performance optimization. Collaborates with development teams to ensure infrastructure meets deployment requirements and conducts experiments with new tools and technologies to improve site reliability trends.

What you'd actually do

  1. Designs and architects infrastructure and/or service according to terms for reliability and functionality.
  2. Forecasts demands for infrastructure and responds to capacity needs, ensuring systems have sufficient resources to handle current and future workloads and identifying resource gaps.
  3. Collaborates with the software development team to develop infrastructures, ensuring features are reliable and scalable according to deployment requirements.
  4. Exercises judgment when performing data collection, triage, technical analysis, and redirection to maintain and optimize operations and infrastructure reliability.
  5. Identifies and recommends opportunities for automation and assesses potential benefits to enhance operational efficiency.

Skills

Required

  • infrastructure design
  • service architecture
  • capacity planning
  • incident response
  • root cause analysis
  • automation development
  • performance reporting
  • technical communication
  • troubleshooting
  • site reliability trends

Nice to have

  • prototyping
  • on-call support
  • security standards adherence
  • business development decision support