Advisor, Data Scientist - Cmc Data Products

Eli Lilly Eli Lilly · Pharma · Indianapolis, IN

The role focuses on developing and delivering enterprise-scale data products that power AI-driven insights, process optimization, and regulatory compliance within the pharmaceutical domain. It involves defining data archetypes, creating reusable data models, and implementing data frameworks for regulated environments. The core responsibility is building AI-ready data products, including training datasets for various AI/ML applications and supporting generative AI for knowledge management.

What you'd actually do

  1. Define the roadmap and deliver analysis-ready and AI-ready data products that enable AI/ML applications, PAT systems, near-time analytical testing, and process intelligence across CMC workflows.
  2. Define pharmaceutical-specific data archetypes (process, analytical, quality, CMC submission) and create reusable data models aligned with industry standards (ISA-88, ISA-95, CDISC, eCTD).
  3. Implement data frameworks that ensure 21 CFR Part 11, ALCOA+, and data integrity compliance, while enabling scientific innovation and self-service access.
  4. Build training datasets for lab automation, process optimization, and predictive CQA models, and support generative AI applications for knowledge management and regulatory Q&A.
  5. Collaborate with analytical R&D, process development, manufacturing science, quality, and regulatory affairs to standardize data products.

Skills

Required

  • Master’s degree in Computer Science, Data Science, Machine Learning, AI, or related technical field
  • 8+ years of product management experience focused on data products, data platforms, or scientific data systems
  • strong grasp of modern data architecture patterns (data warehouses, data lakes, real-time streaming)
  • Knowledge of modern data stack technologies (Microsoft Fabric, Databricks, Airflow) and cloud platforms (AWS- S3, RDS, Lambda/Glue, Azure)
  • Demonstrated experience designing data products that support AI/ML workflows and advanced analytics in scientific domains
  • Proficiency with SQL, Python, and data visualization tools
  • Experience with analytical instrumentation and data systems (HPLC/UPLC, spectroscopy, particle characterization, process sensors)
  • Knowledge of pharmaceutical manufacturing processes, including batch and continuous manufacturing, unit operations, and process control
  • Expertise in data modeling for time-series, spectroscopic, chromatographic, and hierarchical batch/lot data
  • Experience with laboratory data management systems (LIMS, ELN, SDMS, CDS) and their integration patterns

Nice to have

  • Understanding of Design of Experiments (DoE), Quality by Design (QbD), and process validation strategies
  • Experience implementing data mesh architectures in scientific organizations
  • Knowledge of MLOps practices and model deployment in validated environments
  • Familiarity with regulatory submissions (eCTD, CTD) and how analytical data supports marketing applications
  • Experience with CI/CD pipelines (GitHub Actions, CloudFormation) for scientific applications

What the JD emphasized

  • 21 CFR Part 11
  • ALCOA+
  • data integrity compliance
  • GxP compliance
  • audit readiness

Other signals

  • AI/ML applications
  • training datasets for lab automation
  • process optimization
  • predictive CQA models
  • generative AI applications