Principal Core Infrastructure Engineer

Oracle Oracle · Enterprise · HYDERABAD, TELANGANA, India

Lead the architecture and development of highly scalable distributed systems for an Access Governance product, focusing on performance, fault tolerance, zero-downtime upgrades, system monitoring, resilience testing, and security compliance. This role involves complex problem-solving, automation, and mentoring.

What you'd actually do

  1. Lead the architecture and development of horizontally and vertically scalable distributed systems powering a robust Access Governance product capable of handling hyper-scale data processing.
  2. Optimize code and infrastructure for high-throughput data storage and retrieval.
  3. Build fault-tolerant platforms designed to withstand network disruptions and support zero-downtime maintenance through redundancy, data replication, and automatic failover.
  4. Establish comprehensive key performance indicators and build sophisticated telemetry, dashboards, and proactive alerting mechanisms to continuously monitor system health.
  5. Proactively diagnose and resolve complex production issues while providing expert guidance during on-call incident response and root cause investigations.

Skills

Required

  • Architecture and development of scalable distributed systems
  • Optimization of code and infrastructure for high-throughput data storage and retrieval
  • Building fault-tolerant platforms
  • Zero-downtime maintenance strategies
  • Performance and load testing
  • System monitoring and telemetry
  • Incident response and root cause analysis
  • Automation tools and cloud infrastructure scripting
  • Security measures (encryption, access controls)
  • Compliance with industry standards and regulations
  • Project management and delegation
  • Mentoring junior engineers

Nice to have

  • Access Governance product development
  • Elastic computing environments
  • Fault-injection and brown-out testing
  • Load-shedding, throttling, rate-limiting
  • Cross-functional collaboration

What the JD emphasized

  • highly scalable distributed systems
  • massive workloads
  • complex software programs
  • high-performance platforms
  • unpredictable network failures
  • complex engineering challenges
  • heavy data traffic
  • seamless, zero-downtime upgrades
  • deep system monitoring
  • rigorous resilience testing
  • hands-on problem-solving
  • horizontally and vertically scalable distributed systems
  • hyper-scale data processing
  • high-throughput data storage and retrieval
  • elastic computing environments
  • rigorous performance and load testing
  • dynamic system demands
  • fault-tolerant platforms
  • network disruptions
  • zero-downtime maintenance
  • redundancy
  • data replication
  • automatic failover
  • advanced traffic management strategies
  • load-shedding
  • throttling
  • rate-limiting
  • strict service level objectives
  • comprehensive key performance indicators
  • sophisticated telemetry
  • dashboards
  • proactive alerting mechanisms
  • system health
  • complex testing scenarios
  • fault-injection
  • brown-outs
  • complex production issues
  • on-call incident response
  • root cause investigations
  • automation tools
  • cloud infrastructure scripts
  • safe patching
  • updates
  • seamless rollbacks
  • robust security measures
  • encryption
  • access controls
  • multi-tenant environments
  • strict compliance with industry standards and regulations
  • complex project timelines
  • efficiently delegating tasks
  • prioritizing workloads
  • multiple engineering initiatives
  • cross-functional stakeholders
  • technical solutions
  • core business objectives
  • continuous improvement
  • engineering workflows
  • advanced problem-solving strategies
  • elevate overall team capabilities
  • mentoring junior engineers
  • sharing industry best practices
  • actively participating in candidate evaluations
  • high-performing talent pipeline