Site Reliability Developer 4

Oracle Oracle · Enterprise · United States

This role focuses on Site Reliability Engineering (SRE) for cloud services, emphasizing the design, development, and deployment of software to enhance availability, scalability, and efficiency. Responsibilities include managing large-scale distributed systems, capacity planning, performance analysis, system tuning, and ensuring the security, resiliency, scale, and performance of production services. The role requires full stack ownership, understanding end-to-end service characteristics, and applying automation and orchestration principles to prevent problem recurrence and troubleshoot complex issues.

What you'd actually do

  1. Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence.
  2. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services.
  3. Design and develop designs, architectures, standards, and methods for large-scale distributed systems.
  4. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.
  5. Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas.

Skills

Required

  • distributed systems
  • automation
  • orchestration
  • performance analysis
  • system tuning
  • capacity planning
  • software development
  • cloud services

Nice to have

  • security
  • resiliency
  • scalability
  • efficiency