Site Reliability Developer 3

Oracle Oracle · Enterprise · Japan

Site Reliability Developer role focused on operating and improving the reliability, scalability, and performance of the Japan Sovereign Cloud platform. Responsibilities include automating operations, resolving production issues, enhancing service resiliency, and participating in a 24x7 shift rotation. Requires experience in SRE, Systems Engineering, Cloud Operations, or DevOps, with knowledge of cloud computing, distributed systems, and automation technologies. Proficiency in scripting/programming languages like Python, Java, Go, or Shell is necessary.

What you'd actually do

  1. help operate and improve the reliability, scalability, and performance of the Japan Sovereign Cloud platform
  2. leverage software engineering principles to automate operations, resolve complex production issues, and enhance service resiliency
  3. partner with shift teams to capture recurring operational issues, improve alert actionability, maintain operational documentation, and contribute practical fixes through tooling, automation, and process improvements
  4. learn day-to-day sovereign cloud operations, follow shift procedures, and identify recurring operational pain points
  5. improve runbooks, alert response guidance, and operational handoff quality

Skills

Required

  • Site Reliability Engineering
  • Systems Engineering
  • Cloud Operations
  • DevOps
  • Software Development
  • Linux-based production environments
  • cloud computing
  • networking
  • distributed systems
  • automation technologies
  • Python
  • Java
  • Go
  • Shell
  • Japanese language proficiency
  • English communication skills

Nice to have

  • experience supporting mission-critical workloads
  • experience supporting sovereign cloud platforms

What the JD emphasized

  • 24x7 shift workflows
  • 24x7 on-call and shift-based operational support model