What you'd actually do

Own the reliability, availability, performance, and operations of production services.

Support cloud-native EHR platforms built with microservices, Kubernetes, and OCI.

Improve monitoring, alerting, observability, and incident response.

Use AI, automation, and AIOps to reduce manual work and improve system health.

Build tools and scripts for deployment, monitoring, recovery, and operational tasks.

Skills

Required

Java
Python
Shell scripting
microservices
Kubernetes
cloud platforms
OCI
AWS
Azure
GCP
troubleshooting
debugging
monitoring
logging
alerting
observability tools
REST APIs
JSON
XML
SQL
secure data handling
automation
CI/CD
production deployment
customer-impacting issues
technical escalations

Nice to have

EHR platforms
healthcare platforms
HL7
FHIR
Oracle Health
New Millennium
Oracle Database
Kubernetes
OCI

Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.

As a Site Reliability Engineer, you will work with the Production Engineering and SRE teams to own, run, and improve critical healthcare services. You will help keep our cloud-native EHR platforms reliable, secure, scalable, and easy to operate.

You will understand how services are built, deployed, monitored, and supported in production. You will work closely with development teams to improve service design, reduce failures, automate manual work, and improve performance.

You will also help use AI and AIOps to improve operations, including smarter alerting, faster incident detection, automated troubleshooting, and better root cause analysis.

Key Responsibilities

Own the reliability, availability, performance, and operations of production services.
Support cloud-native EHR platforms built with microservices, Kubernetes, and OCI.
Understand service architecture, dependencies, capacity, security, and failure points.
Improve monitoring, alerting, observability, and incident response.
Use AI, automation, and AIOps to reduce manual work and improve system health.
Build tools and scripts for deployment, monitoring, recovery, and operational tasks.
Troubleshoot complex production issues and drive them to resolution.
Lead root cause analysis for major incidents and help prevent repeat issues.
Partner with development teams to improve service design and operability.
Create and maintain SOPs, runbooks, dashboards, and knowledge articles.
Support migration and modernization of existing hosting environments to OCI.
Participate in 24x7 on-call rotation for critical services.

AI and Automation Focus

Design and support AI-driven operational automation.
Use AI/AIOps for anomaly detection, alert correlation, and incident insights.
Help build self-healing and auto-remediation capabilities.
Apply AI safely to improve reliability, supportability, and customer experience.
Work with engineering teams to bring applied AI into production operations.

What You Bring

3 to 5+ years of experience with production systems or distributed platforms.
Strong experience with Java and scripting using Python or Shell.
Good knowledge of microservices, Kubernetes, and cloud platforms.
Experience with OCI, AWS, Azure, or GCP.
Strong troubleshooting and debugging skills.
Experience with monitoring, logging, alerting, and observability tools.
Knowledge of REST APIs, JSON/XML, SQL, and secure data handling.
Experience with automation, CI/CD, and production deployment.
Ability to handle customer-impacting issues and technical escalations. 10 Experience with AI/ML, AIOps, or automation in production is a plus.

Nice to Have

Experience with EHR or healthcare platforms.
Knowledge of HL7 or FHIR.
Oracle Health or New Millennium experience.
Oracle Database experience.
Strong Kubernetes and OCI experience.

Career Level - IC3