Principal Data & AI Engineer

Johnson & Johnson Johnson & Johnson · Pharma · Limerick, Ireland +1

Principal Data Engineer focused on building and evolving scalable, governed data platform capabilities for advanced analytics, machine learning, and Generative AI/Agentic AI solutions. The role involves architecting AI-ready data foundations, implementing RAG pipelines, embedding generation, vector indexing, semantic search, and supporting multi-agent orchestration. It emphasizes LLMOps, MLOps, AI observability, and data governance for enterprise-scale AI workloads.

What you'd actually do

  1. Architect and build core data platform capabilities that enable enterprise AI, machine learning, and Generative AI workloads, delivering scalable, governed solutions with strong performance and cost efficiency.
  2. Build and operate scalable pipelines for structured and unstructured data ingestion for AI training and inference workloads (batch/stream), implementing data quality checks, lineage capture, and clear SLAs.
  3. Design, implement, and harden reusable data services and APIs that support large language models (LLMs), knowledge retrieval systems, and AI-powered applications, meeting reliability and latency targets.
  4. Design, build, and productionize Retrieval-Augmented Generation (RAG) pipelines, enabling GenAI models to access trusted enterprise data with strong latency, reliability, and evaluation coverage.
  5. Build integrations that connect enterprise knowledge across data lakes, document stores, APIs, and enterprise systems, enabling secure retrieval and reuse in AI applications while reducing duplicated point solutions.

Skills

Required

  • Data engineering
  • AI platform engineering
  • Software engineering
  • Data architecture
  • Machine learning pipelines
  • Generative AI
  • Agentic AI
  • LLM integration
  • RAG implementation
  • Vector databases
  • Knowledge graphs
  • Semantic search
  • LLMOps
  • MLOps
  • AI Observability
  • Data governance
  • CI/CD
  • Infrastructure-as-code

Nice to have

  • Experience with specific LLM frameworks
  • Experience with cloud platforms (AWS, Azure, GCP)
  • Experience with real-time data processing

What the JD emphasized

  • enterprise technical leadership
  • scalable, governed data platform capabilities
  • AI-ready data foundations
  • Retrieval-Augmented Generation (RAG)
  • vector search
  • knowledge graphs
  • semantic layers
  • LLM-powered applications
  • enterprise AI, machine learning, and Generative AI workloads
  • structured and unstructured data ingestion for AI training and inference workloads
  • large language models (LLMs)
  • knowledge retrieval systems
  • AI-powered applications
  • LLM abstraction
  • enterprise knowledge
  • Retrieval-Augmented Generation (RAG) pipelines
  • embedding generation, vector indexing, and semantic search capabilities
  • AI copilots, conversational AI solutions, and agent-based AI workflows
  • multi-agent orchestration and tool-enabled AI systems
  • context assembly pipelines
  • context-collection strategies
  • context pressure
  • context rot
  • data ingestion, transformation, and serving pipelines
  • data models and data products
  • CI/CD, automated testing, and infrastructure-as-code
  • self-verifiable agentic feedback loops
  • LLMOps and MLOps frameworks
  • AI systems
  • data lineage, prompt and model evaluation, monitoring, and performance tracking
  • enterprise data governance practices

Other signals

  • AI platform engineering
  • Generative AI
  • Agentic AI
  • LLM-powered applications
  • Retrieval-Augmented Generation (RAG)
  • vector search
  • knowledge graphs
  • semantic layers
  • LLMOps
  • MLOps
  • AI Observability