Software Developer 3

Oracle Oracle · Enterprise · BENGALURU, KARNATAKA, India

Senior Software Engineer role focused on building and evolving a next-generation Data Platform for Oracle Health Data Intelligence (HDI). The role involves designing and operating lakehouse and batch processing systems, including ETL/ELT pipelines, data quality validation, and metadata management on OCI, to serve high-quality healthcare data for analytics and AI workloads.

What you'd actually do

  1. Build and operate batch-first ETL/ELT pipelines that ingest and transform data into curated lakehouse layers (e.g., raw → refined → curated).
  2. Design scalable data processing jobs using distributed compute frameworks (e.g., Spark/Beam) with strong attention to correctness, performance, and cost.
  3. Contribute to the architecture and evolution of our data lakehouse, including data layout/partitioning, compaction strategies, schema evolution, and backfills/reprocessing.
  4. Develop and maintain platform components for metadata management, dataset publishing, and pipeline orchestration.
  5. Implement data quality validation, lineage/metadata capture, and operational best practices (SLAs/SLOs, alerting, runbooks, auditing).

Skills

Required

  • Java
  • Python
  • Apache Spark
  • Apache Beam
  • lakehouse concepts
  • data platform concepts
  • partitioning
  • schema management
  • incremental processing
  • file formats
  • dataset versioning
  • cloud data services
  • Object Storage
  • Oracle BDS
  • data modeling
  • governance fundamentals
  • pipeline observability
  • metrics
  • logs
  • alerts
  • job monitoring
  • troubleshooting production workflows

Nice to have

  • feature store
  • metadata catalogs
  • data discovery/governance tooling
  • semantic indexing
  • vector search
  • Oracle Database 23ai vector capabilities
  • building retrieval datasets for AI workloads
  • Docker
  • Kubernetes
  • CI/CD for data/compute workloads
  • Healthcare domain exposure
  • regulated-data environments

What the JD emphasized

  • batch-first ETL/ELT pipelines
  • data lakehouse
  • metadata management
  • data quality validation
  • pipeline observability