Principal Data Engineer – Safety Analytics (global Medical Safety)

Johnson & Johnson Johnson & Johnson · Pharma · Horsham, PA +2

Principal Data Engineer for Global Medical Safety, focused on building and enabling modern safety analytics tools using AI, Machine Learning, and GenAI on GCP. Responsibilities include end-to-end data engineering, feature engineering, RAG implementation, API/microservice development, and ensuring GxP compliance for pharmacovigilance data.

What you'd actually do

  1. Design and maintain production-grade data pipelines and curated datasets that directly support pharmacovigilance activities, including safety monitoring, analytics, and regulatory reporting.
  2. Enable AI/ML and GenAI workflows for safety analytics, including: Feature engineering and feature store enablement, Embeddings, vectorized representations, and semantic retrieval, Retrieval-Augmented Generation (RAG) patterns for safety analytics tools
  3. Own the end-to-end data lifecycle for safety analytics, from source system intake through transformation, serving, and downstream analytical consumption, ensuring data continuity, traceability, and integrity.
  4. Establish and enforce data quality, validation, lineage, and observability standards for safety analytics datasets.
  5. Apply GxP validation expertise to data pipelines, analytics services, and supporting infrastructure.

Skills

Required

  • Data pipeline design and maintenance
  • Data quality and governance
  • Cloud data platforms (GCP: BigQuery, Dataform)
  • API and microservices development
  • Infrastructure as Code (Terraform)
  • CI/CD pipelines
  • GxP validation and regulatory compliance

Nice to have

  • Machine Learning
  • GenAI
  • Feature stores
  • Embeddings
  • RAG
  • Semantic retrieval

What the JD emphasized

  • production-grade
  • reproducible, explainable, and trusted analytics outputs
  • GxP validation expertise
  • regulatory use
  • inspection readiness

Other signals

  • AI/ML and GenAI workflows
  • Feature engineering and feature store enablement
  • Embeddings, vectorized representations, and semantic retrieval
  • Retrieval-Augmented Generation (RAG) patterns
  • APIs and microservices-based architectures to operationalize safety analytics and ML capabilities