Director, Core Infrastructure Engineering

Oracle Oracle · Enterprise · Seattle, WA +1

Director of Core Infrastructure Engineering at Oracle, leading multiple teams to implement strategies for architecture, delivery, and optimization of scalable, reliable, and secure distributed systems. Focuses on high-throughput data processing, fault-tolerant architectures, system reliability, performance monitoring, incident management, security, and automation (IaC).

What you'd actually do

  1. Leads a multple teams to implement strategies for the architecture and delivery of interdependent, scalable distributed systems that meet organizational and customer demands.
  2. Orchestrates cross-group optimization for high‑throughput, large‑scale data processing; aligns stakeholders on scalability requirements; and oversees elastic designs and effective use of data plane platforms.
  3. Provides strategic oversight for fault‑tolerant, in‑service‑upgradable architectures, sets direction for partition‑aware design choices, and leads initiatives to harden networks via load‑shedding, throttling, and rate‑limiting.
  4. Establishes expectations for formal verification and peer reviews, and sets SLO‑aligned durability and availability standards across the department.
  5. Drives KPI and telemetry strategies; directs creation of complex dashboards and alerting for proactive health assurance; and ensures functional/correctness validation, data replication, and synchronization meet organizational needs.

Skills

Required

  • Distributed systems architecture
  • System design and scalability
  • Reliability engineering
  • Performance optimization
  • Data processing
  • Fault tolerance
  • Network engineering
  • Load balancing
  • Rate limiting
  • Formal verification
  • SLO management
  • KPI definition
  • Telemetry and monitoring
  • Incident management
  • Security architecture
  • Compliance documentation
  • Infrastructure as Code (IaC)
  • Change management

Nice to have

  • Cloud infrastructure management
  • Multi-tenant environments
  • Customer maintenance windows
  • Standard Operating Procedures (SOPs)

What the JD emphasized

  • scalable distributed systems
  • high-throughput
  • large-scale data processing
  • fault-tolerant
  • in-service-upgradable architectures
  • partition-aware design
  • load-shedding
  • throttling
  • rate-limiting
  • formal verification
  • SLO-aligned durability and availability standards
  • KPI and telemetry strategies
  • functional/correctness validation
  • data replication
  • synchronization
  • incident management
  • operational readiness
  • customer maintenance windows
  • SOPs
  • security guidance
  • encryption
  • access controls
  • remediation
  • compliance documentation
  • automation (IaC)
  • change-management alignment
  • patching
  • updating
  • rolling back at scale