Software Developer 5

Oracle Oracle · Enterprise · Seattle, WA +1

Software Developer 5 role on the Oracle Kubernetes Engine (OKE) team, focusing on building and operating massive-scale, integrated cloud services. The role involves enhancing OKE to support demanding AI and accelerated computing workloads, including multi-node GPU clusters and model training/inference platforms. The engineer will also apply modern agentic engineering practices to improve development velocity and operational efficiency. Requires deep Kubernetes expertise and cloud infrastructure experience.

What you'd actually do

  1. Provide technical leadership for major OKE platform initiatives from architecture through implementation, launch, and production operation.
  2. Design and build distributed systems that create, update, scale, repair, and operate Kubernetes clusters across OCI regions.
  3. Improve OKE reliability, scalability, performance, upgrade safety, lifecycle management, observability, automation, and operational tooling.
  4. Work deeply with Kubernetes technologies, including control plane components, controllers/operators, scheduling, autoscaling, Kubernetes APIs, container runtimes, node behavior, and etcd.
  5. Design, debug, and improve Kubernetes networking and storage integrations, including CNI-based networking, Cilium, Calico, Flannel, other container networking implementations, CSI drivers, and OCI infrastructure integrations.

Skills

Required

  • Kubernetes expertise
  • cloud infrastructure experience
  • distributed systems background
  • designing, building, operating, or debugging production cloud services/platforms
  • Kubernetes control plane behavior
  • controllers and operators
  • scheduling
  • autoscaling
  • networking
  • storage
  • service discovery
  • container runtimes
  • node lifecycle
  • Kubernetes APIs
  • etcd
  • Kubernetes networking technologies
  • CSI drivers
  • cloud provider integrations
  • agentic engineering practices
  • software engineering experience
  • building and operating production software systems

Nice to have

  • AI/ML infrastructure
  • multi-node GPU clusters
  • accelerated compute
  • model training or inference platforms
  • GPU scheduling
  • device plugins
  • Karpenter
  • cluster autoscaling
  • CUDA
  • NCCL
  • RoCE
  • InfiniBand
  • RDMA
  • SmartNIC/DPU offload
  • high-performance AI/HPC networking
  • CNI
  • Cilium
  • Calico
  • Flannel
  • OCI, AWS, Azure, GCP, or a large-scale private cloud experience

What the JD emphasized

  • deep Kubernetes expertise
  • required cloud infrastructure experience
  • strong distributed systems background
  • hands-on experience designing, building, operating, or deeply debugging production cloud services, infrastructure platforms, or Kubernetes-based systems at meaningful scale
  • advanced Kubernetes experience
  • AI/ML infrastructure
  • multi-node GPU clusters
  • accelerated compute
  • model training or inference platforms
  • GPU scheduling
  • device plugins
  • Karpenter
  • cluster autoscaling
  • CUDA
  • NCCL
  • RoCE
  • InfiniBand
  • RDMA
  • SmartNIC/DPU offload
  • high-performance AI/HPC networking
  • agentic engineering practices
  • 10+ years of software engineering experience
  • Hands-on cloud infrastructure experience is required
  • Strong hands-on Kubernetes expertise is required