Responsible for the operation, reliability, and performance of production environments that support critical business operations. This role provides operational support across multiple systems and databases, leveraging deep technical expertise in Oracle Cloud Infrastructure (OCI) to resolve complex issues and drive continuous improvements in availability, scalability, and supportability.
Qualification
- Experience supporting production environments in cloud infrastructure platforms, preferably OCI.
- Strong knowledge of Linux/Unix systems, databases, and distributed systems.
- Experience with troubleshooting, monitoring, automation, and incident management.
- Strong analytical and problem-solving skills.
- Ability to thrive in a fast-paced environment supporting mission-critical services.
Preferred
Experience with OCI, Kubernetes, Terraform, or similar cloud technologies.
Knowledge of Site Reliability Engineering (SRE) and DevOps practices.
Monitor, administer, and support large-scale production environments.
Troubleshoot and resolve complex infrastructure, application, and database issues.
Serve as an escalation point for critical production incidents.
Perform root cause analysis and implement preventive solutions.
Recommend and drive improvements to system availability, performance, and operational efficiency.
Develop automation and operational best practices to improve reliability and scalability.
Partner with engineering and cloud teams to support growth, high-performance workloads, and high-availability requirements.
Career Level - IC3