Principal Data Scientist, R&d Oncology

Johnson & Johnson Johnson & Johnson · Pharma · Spring House, PA +7

This role focuses on designing, developing, and maintaining data pipelines and AI-ready data systems for Oncology R&D within Johnson & Johnson. It involves data acquisition, management, storage, and optimization using cloud technologies and best practices, supporting data science partners and ensuring data quality and traceability.

What you'd actually do

  1. Design, develop and maintain data pipelines for acquiring, managing and storing Oncology R&D data from diverse sources (e.g. biomarker labs, real-world data sources, pre-clinical applications)
  2. Work closely with Data Science and Oncology R&D partners to understand, document and prioritize business requirements. Translate these business needs in to high quality data products.
  3. Work closely with other technical leaders, such as Ontology and Knowledge graph Engineers to design and deliver future-proof, AI-ready data systems aligned with Oncology R&D business needs.
  4. Develop Oncology R&D-specific data repositories by implementing standard enterprise-level data models and create new data models as needed. Leverage cloud-based technology platform to accomplish goals, such as building and maintaining data repositories using AWS S3.
  5. Create and optimize data flows for structured and unstructured data using technologies such as Python, R, SQL, AWS services and other relevant tools.

Skills

Required

  • 3+ years of experience in data engineering
  • data modeling
  • database design
  • Proficiency in data engineering tools such as Python, R and SQL for data processing
  • cloud architecture (e.g. AWS services, Redshift, FSx, Glue, Lambda)
  • Experience with unstructured database technologies (e.g. NoSQL)
  • Strong skills in analysis, problem-solving, organizational change, project delivery, and managing external vendors
  • Proven record leading improvement initiatives with multi-disciplinary and remote partners
  • Demonstrated stakeholder management capabilities- including requirements gathering, business analysis and planning
  • Ability to manage a numerous projects simultaneously, prioritize work, exhibit organizational skills and flexibility

Nice to have

  • Advanced degree (Master’s or equivalent) in Computer Science, Engineering, Life Sciences, or other relevant field
  • Experience with healthcare data standards (e.g. CDISC, HL7, FHIR, SNOMED CT, OMOP, DICOM)
  • Exposure to high dimensional data technologies and handling, including imaging
  • Familiarity with machine learning operations (MLOps) and model deployment

What the JD emphasized

  • AI-ready data
  • data modeling
  • database design
  • cloud architecture
  • healthcare data standards