Principal Scientist, Data Science – R&d Dsdh - Therapeutics Development & Supply (tds)

Johnson & Johnson Johnson & Johnson · Pharma · Spring House, PA +4

The Principal Scientist, Data Science – R&D DSDH role at Johnson & Johnson focuses on designing, building, and optimizing data capture, processing, and storage solutions to enable advanced analytics, digital process transformation, and AI/ML applications within the Therapeutics Development & Supply (TDS) continuum. This involves creating AI-ready data pipelines and data products, managing diverse data sources, and ensuring data quality and compliance for scientific, technical, and operational decision-making in a healthcare R&D environment.

What you'd actually do

  1. Design, build, and maintain scalable data pipelines for acquiring, integrating, and managing TDS data from diverse data generation sources and systems (e.g., lab systems, MES, clinical supply, quality systems, external partners).
  2. Create and optimize data flows for structured and unstructured data using Python, R, SQL, cloud services, and other modern engineering tools.
  3. Develop and maintain TDS-specific data repositories, implementing enterprise-level data models and creating new models as needed.
  4. Enable AI/ML readiness by ensuring data is well-structured, versioned, traceable, and semantically aligned with enterprise data standards.
  5. Partner with data scientists, TDS domain experts, and digital technology teams to translate business needs into high-quality data products and engineering requirements.

Skills

Required

  • Python
  • R
  • SQL
  • cloud-based architectures (e.g., AWS services, Snowflake, Redshift)
  • NoSQL databases
  • graph databases
  • data modeling
  • database design
  • analytical skills
  • problem-solving skills
  • stakeholder management skills

Nice to have

  • Experience with regulated or standards-driven data environments, such as CDISC, HL7, FHIR, OMOP, DICOM, or manufacturing/quality data standards.
  • Familiarity with high-dimensional data (e.g., imaging, sensor data, etc).
  • Experience with principles connecting to or feeding MLOps and model deployment workflows.
  • Knowledge of manufacturing systems (MES), laboratory information systems, or industrial data systems.
  • Exposure to knowledge graph or ontology-driven architectures.

What the JD emphasized

  • AI/ML applications
  • AI-ready data pipelines
  • data products
  • data repositories
  • data quality
  • compliance
  • traceability
  • audit readiness
  • regulated or standards-driven data environments

Other signals

  • AI/ML applications
  • AI-ready data pipelines
  • data products
  • data repositories