Consulting Member of Technical Staff

Oracle Oracle · Enterprise · Austin, TX +1

This role focuses on architecting, designing, and operating distributed systems for Oracle Cloud Infrastructure (OCI) compute control plane services, specifically Imaging and Container Registry. The position requires hands-on technical leadership in building highly available, scalable, and cost-efficient multi-tenant systems, driving operational excellence, and influencing the technical roadmap. While AI is mentioned as a context for problem-solving and learning, the core responsibilities are in cloud infrastructure and distributed systems engineering.

What you'd actually do

  1. Architect, design, and operate distributed, highly available, and resilient systems for multi-tenant, horizontally scalable, and cost-efficient architectures that deliver consistent latency, throughput, and durability across OCI regions.
  2. Collaborate cross-functionally with Compute, Storage, Networking, OKE and functions to deliver new platform features focusing on Imaging, Container Registry Services, enforce secure-by-default designs, and improve overall services reliability.
  3. Mentor and guide engineers in distributed systems design, high-scale data processing, and operational excellence; set and raise engineering standards across multiple teams.
  4. Drive operational excellence by owning service-level objectives (availability, latency, durability) and reducing toil through automation, observability, and self-healing mechanisms.
  5. Own the full service lifecycle from design and implementation to deployment, on-call, and continuous improvement — maintaining high code and reliability standards.

Skills

Required

  • Java
  • distributed systems design
  • high-scale data processing
  • operational excellence
  • Linux
  • operating systems
  • Systematic problem-solving
  • communication skills
  • ownership
  • drive
  • service metrics
  • alarms
  • dashboards
  • service KPIs
  • alarming systems
  • automation
  • optimizations
  • enhancements

Nice to have

  • Scala
  • Python
  • data structures
  • algorithms
  • management
  • automation of end-to-end CPU/GPU lifecycles at scale
  • Cloud
  • CICD environments
  • Kubernetes
  • OS Images
  • Terraform
  • modern build tools and pipelines
  • multi-tenant, virtualized infrastructure
  • change control management
  • mature operating processes
  • Security
  • Identity
  • SSL
  • certificates
  • Database
  • Data Stores

What the JD emphasized

  • large scale, highly available distributed systems
  • operating distributed services at scale
  • Deep understanding of service metrics and alarms through the development of dashboards, service KPIs, alarming systems
  • Proven ability to drive technical outcomes, take ownership of deliverables, and work independently in fast-evolving AI solution spaces.
  • Demonstrated problem-solving ability leveraging AI, distributed systems, and cloud-native application behaviors.