Lead Principal Core Infrastructure Engineer

Oracle Oracle · Enterprise · BENGALURU, KARNATAKA, India

This role focuses on architecting, designing, and operating distributed, highly available, and resilient systems for multi-tenant, horizontally scalable, and cost-efficient architectures. The engineer will collaborate with various teams, mentor other engineers, drive operational excellence, and define the technical roadmap for compute and control plane services. While AI is mentioned in the soft skills section, the core responsibilities are centered around core infrastructure and distributed systems engineering, not direct AI/ML model development or deployment.

What you'd actually do

  1. Architect, design, and operate distributed, highly available, and resilient systems for multi-tenant, horizontally scalable, and cost-efficient architectures that deliver consistent latency, throughput, and durability across OCI regions.
  2. Collaborate cross-functionally with Compute, Storage, Networking, OKE and functions to deliver new platform features focusing on compute and control plane services, enforce secure-by-default designs, and improve overall services reliability.
  3. Mentor and guide engineers in distributed systems design, high-scale data processing, and operational excellence; set and raise engineering standards across multiple teams.
  4. Drive operational excellence by owning service-level objectives (availability, latency, durability) and reducing toil through automation, observability, and self-healing mechanisms.
  5. Own the full service lifecycle from design and implementation to deployment, on-call, and continuous improvement — maintaining high code and reliability standards.

Skills

Required

  • 12+ years of development experience with large scale, highly available distributed systems
  • Proficiency in Java programming patterns
  • Advanced knowledge of data structures, algorithms, and operating systems
  • Experience with operating distributed services at scale
  • Expertise in Linux and operating systems
  • Systematic problem-solving approach
  • strong communication skills
  • strong ownership and drive
  • Deep understanding of service metrics and alarms through the development of dashboards, service KPIs, alarming systems
  • Ability to propose, scope, design and direct automation, optimizations, and enhancements
  • BS or MS degree in Computer Science/Engineering or a related IT field or equivalent experience relevant to functional area.

Nice to have

  • Scala programming
  • Python programming

What the JD emphasized

  • large scale, highly available distributed systems
  • operating distributed services at scale
  • Deep understanding of service metrics and alarms
  • proven experience in solving cloud scale problems, distributed systems design & implementation experience