[remote] Senior Site Reliability Developer- Usc Required

Oracle Oracle · Enterprise · United States

Senior Site Reliability Developer role at Oracle Health focused on operating and improving large-scale distributed systems powering AI-assisted clinical systems. The role involves building automation for reliability and efficiency, enhancing observability, participating in incident response, and supporting Kubernetes-based services. It requires experience in SRE/DevOps, cloud environments, Kubernetes, and scripting languages like Python.

What you'd actually do

  1. Operate and improve large-scale distributed systems powering Clinical AI Assistant services
  2. Build automation that improves reliability, scalability, and operational efficiency
  3. Improve observability across metrics, logging, tracing, and alerting
  4. Participate in production operations, incident response, and root-cause analysis
  5. Help build self-healing infrastructure and operational tooling

Skills

Required

  • 4–6 years of experience in Site Reliability Engineering, DevOps, Production Engineering, or related infrastructure roles
  • Experience supporting production systems in cloud or distributed environments
  • Strong Linux fundamentals and troubleshooting skills
  • Experience with Kubernetes, containers, or cloud-native infrastructure
  • Scripting or software development experience with Python, Bash, or similar languages
  • Familiarity with infrastructure as code and CI/CD workflows
  • Understanding of monitoring, alerting, and observability concepts
  • Curiosity, ownership, and a strong engineering mindset

Nice to have

  • Experience with cloud platforms such as OCI, AWS, or Azure
  • Exposure to large-scale distributed systems
  • AI/ML platform or observability tooling experience
  • Experience in regulated or high-availability environments

What the JD emphasized

  • U.S. citizenship required
  • Ability to obtain and maintain a federal security clearance