Senior Core Infrastructure Engineer

Oracle Oracle · Enterprise · BENGALURU, KARNATAKA, India

Senior Core Infrastructure Engineer at Oracle in Bengaluru, India. This role focuses on designing, implementing, and optimizing components in distributed systems with an emphasis on scalability, resiliency, and operability. Responsibilities include leveraging data plane platforms and distributed state tools for high-volume data handling, building fault-tolerant systems, implementing recovery principles, and proactive issue detection and mitigation through tests, alarms, dashboards, and telemetry. The role also involves authoring runbooks, participating in incident response, implementing standard replication and synchronization, developing automation/IaC, and applying advanced security controls while ensuring compliance and documentation standards.

What you'd actually do

  1. Designs, implements, and optimizes components in distributed systems with an emphasis on scalability, resiliency, and operability.
  2. Delivers features and load/performance tests; leverages data plane platforms and distributed state tools for high-volume retrieval, storage, and processing; and reviews peers’ implementations for scalability compliance.
  3. Builds fault-tolerant paths (redundancy, replication, automatic failover), applies recovery‑oriented principles, and implements retries, circuit breakers, and timeouts.
  4. Proactively detects and mitigates issues via tests, alarms, dashboards, and telemetry; authors runbooks and participates in incident response and RCAs.
  5. Implements standard replication and synchronization, develops automation/IaC for troubleshooting and maintenance, and applies advanced security controls (encryption, access, remediation) while ensuring change, compliance, and documentation standards are met.

Skills

Required

  • distributed systems design
  • scalability optimization
  • resiliency engineering
  • operability
  • data plane platforms
  • distributed state management tools
  • high-volume data processing
  • fault tolerance
  • recovery-oriented computing
  • retries
  • circuit breakers
  • timeouts
  • testing
  • alarms
  • dashboards
  • telemetry
  • runbook authoring
  • incident response
  • root cause analysis
  • data replication
  • synchronization
  • automation scripting
  • Infrastructure as Code (IaC)
  • security controls
  • encryption
  • access controls
  • remediation
  • change management
  • compliance
  • documentation

Nice to have

  • performance testing
  • load testing
  • multi-tenant environments