Remote - Site Reliability Developer 3 (usc)

Oracle Oracle · Enterprise · United States

This role is for a Site Reliability Developer (SRE) supporting core data platforms for Oracle Health's Data & Analytics Platform. The focus is on the reliability, scalability, and operability of large-scale, stateful distributed systems like Hadoop, Kafka, and Storm, using automation tools like Ansible and Terraform. The role involves platform ownership, architectural design, operations engineering, distributed systems expertise, security, and incident leadership within a regulated healthcare data environment.

What you'd actually do

  1. Own the end-to-end reliability, scalability, and operability of shared data platforms
  2. Define platform standards, architectural direction, and operational guardrails
  3. Establish capacity models, scaling strategies, and operational best practices
  4. Operate and evolve stateful distributed systems where data placement, replication, and recovery are critical
  5. Design and evolve an Ansible- and Terraform-driven automation framework

Skills

Required

  • Operating large-scale, customer-facing distributed platforms
  • HDFS, YARN, HBase, Kafka, Storm, or similar systems
  • Linux
  • Networking
  • Distributed system troubleshooting
  • Ansible
  • Terraform
  • Python
  • Ruby
  • Bash
  • Kerberized environments
  • Technical architecture documentation
  • Platform ownership
  • Observability
  • Capacity modeling
  • Computer Science fundamentals

Nice to have

  • Cloud momentum
  • Entrepreneurial spirit
  • Energetic and creative environment
  • World class engineering center
  • Focus on excellence
  • Product development
  • Product strategy
  • Modernized, automated healthcare
  • Net new line of business
  • Impact and disrupt the healthcare industry
  • Transforming how healthcare and technology intersect
  • Reach billions of people with our products & services
  • Create technology in which truly impacts the world
  • Ability to have immediate impact on developing technology
  • Unlimited growth potential with inspiring work
  • Work with the best minds in the industry
  • Enjoy working in an open, diverse, and productive environment
  • HealtheIntent

What the JD emphasized

  • U.S. Citizenship required and eligibility for a Federal Security Clearance
  • 4+ years operating large-scale, customer-facing distributed platforms
  • Deep experience with HDFS, YARN, HBase, Kafka, Storm, or similar systems
  • Strong background in Linux, networking, and distributed system troubleshooting
  • Infrastructure-as-Code using Ansible and Terraform
  • Scripting and automation using Python, Ruby, and Bash
  • Hands-on experience operating Kerberized environments
  • Proven ability to define and document technical architecture for complex systems
  • Demonstrated ownership of shared platforms with broad blast radius and multiple downstream consumers
  • Experience designing observability and capacity models for distributed platforms
  • BS or MS in Computer Science, or equivalent