Senior Member of Technical Staff

Oracle Oracle · Enterprise · BENGALURU, KARNATAKA, India

This role focuses on building and evolving a data platform for AI agents, specifically involving lakehouse and batch processing. The engineer will design and implement scalable ETL/ELT pipelines and data platform services on OCI to ingest, transform, curate, and serve healthcare data for analytics and AI workloads. Responsibilities include building batch pipelines, contributing to data lakehouse architecture, developing metadata management components, implementing data quality validation, and optimizing pipelines for reliability and efficiency. The role requires strong programming skills in Java/Python, experience with distributed compute frameworks like Spark/Beam, and knowledge of cloud data services and data modeling.

What you'd actually do

  1. Build and operate batch-first ETL/ELT pipelines that ingest and transform data into curated lakehouse layers (e.g., raw → refined → curated).
  2. Design scalable data processing jobs using distributed compute frameworks (e.g., Spark/Beam) with strong attention to correctness, performance, and cost.
  3. Contribute to the architecture and evolution of our data lakehouse, including data layout/partitioning, compaction strategies, schema evolution, and backfills/reprocessing.
  4. Develop and maintain platform components for metadata management, dataset publishing, and pipeline orchestration.
  5. Implement data quality validation, lineage/metadata capture, and operational best practices (SLAs/SLOs, alerting, runbooks, auditing).

Skills

Required

  • 4-7 years of relevant industry experience in software engineering and/or data engineering.
  • Strong programming skills in Java, Python with solid software engineering fundamentals (OO/design, testing, debugging, performance).
  • Hands-on experience building large-scale batch pipelines using Apache Spark (preferred) and/or Apache Beam (or equivalent).
  • Experience with lakehouse/data platform concepts: partitioning, schema management, incremental processing, file formats (Parquet/ORC), and dataset versioning.
  • Exposure to cloud data services (OCI preferred) such as Object Storage, compute, networking/IAM, and managed data/processing services (e.g., Oracle BDS or equivalents on AWS/GCP/Azure).
  • Strong understanding of data modeling and governance fundamentals (access controls, auditing, retention, PII handling concepts).
  • Practical experience with pipeline observability: metrics, logs, alerts, job monitoring, and troubleshooting production workflows.

Nice to have

  • Experience with feature store, metadata catalogs, or data discovery/governance tooling.
  • Familiarity with semantic indexing / vector search (e.g., Oracle Database 23ai vector capabilities) and/or building retrieval datasets for AI workloads.
  • Experience with Docker and Kubernetes; CI/CD for data/compute workloads.
  • Healthcare domain exposure and comfort operating in regulated-data environments.

What the JD emphasized

  • healthcare data
  • AI workloads
  • batch-first ETL/ELT pipelines
  • data lakehouse
  • metadata management
  • data quality validation
  • pipeline observability
  • regulated-data environments