Principal Scientist, Data Science – R&d Dsdh - Therapeutics Development & Supply (tds)

Johnson & Johnson Johnson & Johnson · Pharma · Madrid, Spain +2

Johnson & Johnson Innovative Medicine is seeking a Principal Scientist, Data Science – Data Engineer to design, build, and optimize data capture, processing, and storage solutions for advanced analytics, digital process transformation, and AI/ML applications within the Therapeutics Development & Supply (TDS) continuum. The role involves creating robust data pipelines and repositories, ensuring data is AI-ready, and partnering with data scientists and domain experts to deliver high-quality data products. Responsibilities include data acquisition, integration, management, quality implementation, and adherence to software development best practices in a regulated environment.

What you'd actually do

  1. Design, build, and maintain scalable data pipelines for acquiring, integrating, and managing TDS data from diverse data generation sources and systems (e.g., lab systems, MES, clinical supply, quality systems, external partners).
  2. Create and optimize data flows for structured and unstructured data using Python, R, SQL, cloud services, and other modern engineering tools.
  3. Develop and maintain TDS-specific data repositories, implementing enterprise-level data models and creating new models as needed.
  4. Enable AI/ML readiness by ensuring data is well-structured, versioned, traceable, and semantically aligned with enterprise data standards.
  5. Partner with data scientists, TDS domain experts, and digital technology teams to translate business needs into high-quality data products and engineering requirements.

Skills

Required

  • Python
  • R
  • SQL
  • cloud-based architectures (e.g., AWS services, Snowflake, Redshift)
  • NoSQL databases
  • graph databases
  • data modeling
  • database design
  • analytical skills
  • problem-solving skills
  • stakeholder-management skills

Nice to have

  • Experience with regulated or standards-driven data environments, such as CDISC, HL7, FHIR, OMOP, DICOM, or manufacturing/quality data standards.
  • Familiarity with high-dimensional data (e.g., imaging, sensor data, etc).
  • Experience with principles connecting to or feeding MLOps and model deployment workflows.
  • Knowledge of manufacturing systems (MES), laboratory information systems, or industrial data systems.
  • Exposure to knowledge graph or ontology-driven architectures.

What the JD emphasized

  • AI/ML applications
  • AI-ready data pipelines
  • regulated or standards-driven data environments

Other signals

  • AI/ML applications
  • AI-ready data pipelines
  • data products
  • data capture, processing, and storage solutions