Software Developer 5

Oracle Oracle · Enterprise · Nashville, TN +1

This role focuses on building and operating Oracle Cloud Infrastructure's observability platform, which handles massive scale and complex distributed systems for telemetry data ingestion, processing, storage, and querying. It is a core engineering role within a mature cloud infrastructure offering.

What you'd actually do

  1. Lead the design, development, and operation of cloud-scale observability platforms supporting metrics, logs, traces, and related telemetry data.
  2. Architect and implement highly scalable, resilient, and cost-efficient telemetry collection, ingestion, processing, storage, and query systems.
  3. Drive the evolution of end-to-end observability pipelines, from instrumentation and data collection through real-time analytics and long-term retention.
  4. Design and optimize distributed systems capable of ingesting and processing massive volumes of telemetry data with stringent latency and availability requirements.
  5. Develop scalable storage and indexing solutions for high-cardinality metrics, large-scale log analytics, and distributed tracing workloads.

Skills

Required

  • design, development, and operation of cloud-scale observability platforms
  • metrics, logs, traces, and related telemetry data
  • highly scalable, resilient, and cost-efficient telemetry collection, ingestion, processing, storage, and query systems
  • end-to-end observability pipelines
  • distributed systems
  • high-throughput telemetry ingestion
  • large-scale data processing
  • cost-efficient storage
  • low-latency query execution
  • multi-tenant reliability
  • operational excellence
  • cloud-native observability platforms
  • massive volumes of telemetry data
  • stringent latency and availability requirements
  • scalable storage and indexing solutions
  • high-cardinality metrics
  • large-scale log analytics
  • distributed tracing workloads
  • fast, reliable, and intuitive access to observability data
  • performance bottlenecks
  • reliability, fault tolerance, scalability, security, and operational excellence
  • hyperscale cloud environments
  • technical strategy and architectural decisions
  • mentoring senior and junior engineers
  • technical leadership
  • engineering best practices
  • collaboration with product management, architects, SREs, and engineering teams
  • troubleshooting and root-cause analysis
  • emerging trends, technologies, and best practices in observability, distributed systems, data processing, and cloud-native architectures

Nice to have

  • AI/ML experience
  • experience with AI/ML model training, serving, or evaluation

What the JD emphasized

  • cloud-scale observability platforms
  • massive scale
  • distributed systems
  • high-throughput telemetry ingestion
  • large-scale data processing
  • cost-efficient storage
  • low-latency query execution
  • multi-tenant reliability
  • operational excellence
  • cloud-native observability platforms
  • massive volumes of telemetry data
  • stringent latency and availability requirements
  • scalable storage and indexing solutions
  • high-cardinality metrics
  • large-scale log analytics
  • distributed tracing workloads
  • fast, reliable, and intuitive access to observability data
  • performance bottlenecks
  • reliability, fault tolerance, scalability, security, and operational excellence
  • hyperscale cloud environments