Associate Principal Scientist, Preclinical Development

Merck Merck · Pharma · NJ

This role focuses on creating next-generation mammalian cell lines for therapeutic development by designing synthetic genetic circuits, managing multi-omics datasets, and utilizing NGS. It involves applying machine learning (ML) or deep learning (DL) models to inform experimental design and predict cell line performance, alongside computational modeling of cellular pathways and metabolism. The role requires strong programming skills in Python or R for biological data analysis and experience with Linux/Unix environments.

What you'd actually do

  1. Design and optimize complex DNA vector constructs and synthetic gene circuits to enhance protein expression in CHO cells.
  2. Develop and apply computational models to understand cellular pathways and metabolism, predicting how genetic modifications impact global cell behavior and product quality.
  3. Build and maintain data pipelines to process high-dimensional datasets from high-throughput screening.
  4. Lead NGS workflows, from library preparation to bioinformatics analysis.
  5. Partner with Upstream, Downstream, and Analytical teams to deliver integrated CMC solutions.

Skills

Required

  • Python
  • R
  • Linux/Unix
  • Bioinformatics
  • Mammalian Cell Culture
  • Molecular Cloning
  • NGS
  • Data Analysis
  • Systems Biology
  • Chemical Engineering

Nice to have

  • Machine Learning (ML)
  • Deep Learning (DL)
  • High-throughput screening
  • Synthetic Biology
  • Gene Editing
  • Vector Constructs
  • CHO cells
  • IND/BLA filings

What the JD emphasized

  • Ph.D. in Molecular Biology, Bioinformatics, Systems Biology, Chemical Engineering, or a related field; or a Master’s degree with 5–8+ years of industry experience.
  • Strong programming skills in Python or R for biological data analysis.
  • Experience with Linux/Unix environments and command-line bioinformatics tools (BWA, GATK, Samtools).
  • Proven track record in library construction and data management for DNA-seq and RNA-seq workflows.

Other signals

  • machine learning (ML) or deep learning (DL) models to inform experimental design and predict cell line performance
  • computational models to understand cellular pathways and metabolism
  • Python or R for biological data analysis