Senior Site Reliability Engineer

Oracle Oracle · Enterprise · United States

Senior Site Reliability Engineer responsible for designing, architecting, and maintaining reliable and scalable infrastructure. This role involves capacity planning, incident response, automation, performance monitoring, and collaborating with development teams to ensure system functionality and optimize operations. The engineer will also experiment with new tools and stay updated on site reliability trends.

What you'd actually do

  1. Takes proactive steps to design and architect infrastructure and/or service according to terms for reliability and functionality.
  2. Forecasts demands for infrastructure and responds to capacity needs, ensuring systems have sufficient resources to handle current and future workloads.
  3. Collaborates with the software development team to develop infrastructures and features that are reliable and scalable according to deployment requirements.
  4. Performs data collection, triage, technical analysis, and redirection to maintain and optimize operations and infrastructure reliability.
  5. Leverages comprehensive knowledge to perform incident response, root cause analyses, and/or maintenance on assigned services (e.g., software installs, version upgrades, security updates, backup and recovery).

Skills

Required

  • infrastructure design and architecture
  • capacity planning
  • system reliability
  • incident response
  • root cause analysis
  • automation scripting
  • performance monitoring
  • collaboration with development teams
  • troubleshooting
  • technical communication

Nice to have

  • experience with new tools and technologies
  • knowledge of site reliability trends
  • on-call support