Principal Data Scientist – R&d Dsdh - Therapeutics Discovery (td)

Johnson & Johnson Johnson & Johnson · Pharma · Spring House, PA +4

The Principal Data Scientist will build and apply advanced Machine Learning (ML) and Data Engineering solutions to accelerate scientific innovation in drug discovery. This role involves developing ML/AI models for target prioritization, multi-omics integration, and mechanistic inference, applying modern ML approaches to diverse datasets, and building robust data pipelines. The scientist will collaborate with discovery teams to translate experimental data into actionable insights and deploy models into production R&D environments.

What you'd actually do

  1. Develop ML/AI models that support discovery workflows, including target prioritization, multi‑omics integration, and mechanistic inference.
  2. Apply modern ML approaches (e.g., deep learning, graph learning, foundation models, generative models) to chemical, biological, imaging, and assay datasets.
  3. Build and optimize models for real‑world R&D use cases, ensuring scalability, interpretability, and scientific rigor.
  4. Design, build, and maintain robust data pipelines that curate, standardize, and integrate diverse R&D datasets (chemical, biological, multi‑omics, imaging, biophysical, automation logs, etc.).
  5. Partner with platform teams to implement best‑practice MLOps/DevOps workflows and deploy ML models into production R&D environments

Skills

Required

  • Python
  • PyTorch
  • TensorFlow
  • scikit-learn
  • RDKit
  • Data Engineering
  • MLOps
  • DevOps
  • Cloud Computing (AWS, GCP, or Azure)

Nice to have

  • Ph.D.
  • Computational Biology
  • Bioinformatics
  • Data Science
  • Chemistry
  • Chemical Biology
  • Biomedical Engineering
  • Computer Science
  • Drug Discovery
  • Biology
  • Systems Biology
  • Imaging
  • Pharma or Biotech Discovery
  • Target Assessment
  • Phenotypic Screening
  • Medicinal Chemistry Workflows
  • Lab Automation
  • Omics
  • High-content Imaging
  • Chemical Structure Data
  • Biological Assay Data
  • FAIR Data Standards
  • Ontologies
  • Controlled Vocabularies
  • Regulated Environments
  • Quality-governed Environments

What the JD emphasized

  • Master’s or Ph.D. in Computational Biology, Bioinformatics, Data Science, Chemistry, Chemical Biology, Biomedical Engineering, Computer Science, or related field.
  • Experience applying ML/AI in scientific domains (drug discovery, biology, chemistry, systems biology, imaging, or related areas).
  • Strong programming skills in Python (preferred) and experience with scientific/ML libraries (PyTorch, TensorFlow, scikit‑learn, RDKit, etc.).
  • Practical experience with data engineering, including data modeling, workflow orchestration, ETL/ELT pipelines, and cloud computing environments (AWS, GCP, or Azure).

Other signals

  • Develop ML/AI models that support discovery workflows, including target prioritization, multi‑omics integration, and mechanistic inference.
  • Apply modern ML approaches (e.g., deep learning, graph learning, foundation models, generative models) to chemical, biological, imaging, and assay datasets.
  • Build and optimize models for real‑world R&D use cases, ensuring scalability, interpretability, and scientific rigor.
  • Design, build, and maintain robust data pipelines that curate, standardize, and integrate diverse R&D datasets (chemical, biological, multi‑omics, imaging, biophysical, automation logs, etc.).
  • Partner with platform teams to implement best‑practice MLOps/DevOps workflows and deploy ML models into production R&D environments