Software Developer 5

Oracle Oracle · Enterprise · United States

Software Developer 5 at Oracle to provide technical leadership for Oracle’s messaging and eventing ecosystem, including Oracle Streaming, Oracle Queue, and Oracle Streaming Service with Apache Kafka. The role involves defining architecture, reliability, and scalability strategies for core services enabling event-driven and streaming workloads on OCI. Responsibilities include architecting, designing, and operating distributed systems, defining technical roadmaps, leading system design for scalable architectures, collaborating with cross-functional teams, mentoring engineers, driving operational excellence, owning the full service lifecycle, and partnering with product management.

What you'd actually do

  1. Architect, design, and operate distributed, highly available, and resilient systems supporting real-time data ingestion, message queuing, and stream processing at massive scale.
  2. Define and drive the technical roadmap for Streaming, Queue, and Managed Kafka services.
  3. Lead system design for multi-tenant, horizontally scalable, and cost-efficient architectures that deliver consistent latency, throughput, and durability across OCI regions.
  4. Collaborate cross-functionally with storage, networking, observability, and security teams to deliver new platform features, enforce secure-by-default designs, and improve overall fleet reliability.
  5. Mentor and guide engineers in distributed systems design, high-scale data processing, and operational excellence; set and raise engineering standards across multiple teams.

Skills

Required

  • Apache Kafka
  • Raft/Zookeeper/KRaft internals
  • Message queuing systems (RabbitMQ, ActiveMQ)
  • AMQP protocols
  • Kubernetes
  • Java
  • Go
  • Distributed systems
  • Cloud platforms (OCI, AWS, Azure, GCP)
  • Terraform
  • CI/CD

Nice to have

  • Tier-0 or mission-critical services
  • Stringent SLAs
  • Open-source messaging systems (Kafka, RabbitMQ, Pulsar, Flink)
  • Observability stacks (Prometheus, OpenTelemetry, Grafana)
  • SLOs, SLIs, error budgets
  • OCI-specific services
  • IAM integration
  • Region/fault-domain isolation models

What the JD emphasized

  • 15+ years of professional experience developing and operating large-scale, distributed systems or cloud-native services.
  • Deep expertise in Apache Kafka, including Raft/Zookeeper/KRaft internals, performance, latency and operating production Kafka clusters at scale.
  • Strong hands-on experience with message queuing systems such as RabbitMQ, ActiveMQ, or equivalent enterprise queue technologies, including understanding of AMQP protocols and queue semantics (FIFO, DLQ, fan-out, and priority).
  • Hands-on experience with Kubernetes, including deployment, scaling, and operating stateful workloads in containerized environments.
  • Proficiency in Java, Go, or similar object-oriented languages; ability to produce high-quality, performant, and maintainable code.
  • Experience with operating at scale — production debugging, performance tuning, capacity modeling, and regional failover strategies.
  • Demonstrated technical leadership, influencing architecture and execution across multiple teams, and mentoring other senior engineers. Excellent communication skills, able to articulate complex designs and trade-offs clearly across engineering and product stakeholders.