Principal Software Developer (infra / Ops)

Oracle Oracle · Enterprise · United States

This role focuses on infrastructure cloud services, specifically detecting, triaging, and mitigating OCI service-impacting events to minimize downtime. Responsibilities include incident management, automation, system operations, and defining technical architecture for large-scale distributed systems, with a focus on security, resiliency, scalability, and performance.

What you'd actually do

  1. Solve complex problems related to infrastructure cloud services and automate common tasks to ensure continuous availability with minimal human intervention.
  2. Command and coordinate SMEs and service leaders to restore services as quickly as possible during major incidents, while keeping accurate and timely data on the progress of such incidents.
  3. Utilize a deep understanding of cloud computing design patterns and their dependencies to mitigate complex major incidents.
  4. Embed a methodical approach to troubleshoot large, complex, interconnected systems used in incident detection and orchestration.
  5. Document pertinent information related to incidents that aids process improvement, identifies deviations, and enables the creation of an incident knowledge base.

Skills

Required

  • public cloud operations experience (e.g., AWS, Azure, GCP, OCI)
  • Strong operations experience in a cloud-based environment
  • Demonstrate clear understanding of automation and orchestration principles
  • Experience having worked in at least one modern object-oriented programming language
  • Experience with professional software engineering standard methodologies such as Agile project management, coding standards, code reviews, source control management, build processes, testing, and operations
  • Familiarity with infrastructure automation tools such as Chef, Ansible, Jenkins, Terraform
  • Excellent expertise with several of following technologies: Infrastructure-as-a-Service, CI/CD systems, Docker, RESTful APIs, log analysis tools, debugging tools

Nice to have

  • AI tools and agentic experience

What the JD emphasized

  • public cloud operations experience
  • Strong operations experience in a cloud-based environment
  • AI tools and agentic experience preferred