Sr Scientist, Data Science – R&d Dsdh – Discovery Biologics

Johnson & Johnson Johnson & Johnson · Pharma · Cambridge, MA +2

This role focuses on designing, building, and maintaining scalable data pipelines and workflows for biologics discovery, integrating various data sources and systems to enhance AI/ML and advanced analytics in drug discovery processes. The primary focus is on data engineering to support scientific learning and decision-making.

What you'd actually do

  1. Design, develop, and maintain our Discovery Biologics data pipelines, integrating third party solutions, and data ingestion from our external partners.
  2. Collaborate with data product owners, data scientists, analysts, architects, and other partners to understand data requirements and deliver high-quality solutions that enable Discovery processes for Biologics.
  3. Build integrations with Therapeutics Discovery systems, Discovery data repositories, data ingestion from external partners, and other adjacent sources.
  4. Optimize data workflows with implementations enabling ease of use, performance, scalability, and reliability.
  5. Monitor and solve platform issues, ensuring timely resolution.

Skills

Required

  • Python
  • PyTorch
  • TensorFlow
  • scikit-learn
  • RDKit
  • Data modeling
  • Workflow orchestration
  • ETL/ELT pipelines
  • AWS
  • GCP
  • Azure
  • Computational Biology
  • Bioinformatics
  • Data Science
  • Biomedical Engineering
  • Computer Science

Nice to have

  • Life science environment experience
  • Antibody or protein engineering experience
  • HPC integration
  • MLOps tools integration

What the JD emphasized

  • Advanced degree in Computational Biology, Bioinformatics, Data Science, Biomedical Engineering, Computer Science, or related fields.
  • Experience applying ML/AI in scientific domains (drug discovery, biology, chemistry, systems biology, or related areas).
  • Strong programming skills in Python (preferred) and experience with scientific/ML libraries (PyTorch, TensorFlow, scikit‑learn, RDKit, etc.).
  • Practical experience with data engineering, including data modeling, workflow orchestration, ETL/ELT pipelines, and cloud computing environments (AWS, GCP, or Azure).

Other signals

  • design, build, and maintain a configurable, user-friendly, scalable, and integrated suite of data pipelines and workflows to drive scientific learning in Therapeutics Discovery
  • enhance the impact of AI/ML and advanced analytics in our drug discovery processes
  • integrating third party solutions, and data ingestion from our external partners
  • Build integrations with Therapeutics Discovery systems, Discovery data repositories, data ingestion from external partners, and other adjacent sources
  • Optimize data workflows with implementations enabling ease of use, performance, scalability, and reliability