AI Data Engineer--llms / Agentic Systems

Pfizer Pfizer · Pharma · MA

This role focuses on building and deploying production-grade full-stack applications that integrate LLM and AI capabilities into pharmaceutical research workflows. Responsibilities include developing backend services for data processing, embedding generation, vector search, and LLM orchestration, creating frontend interfaces, implementing RAG systems and agentic LLM architectures, and deploying/maintaining systems on AWS. The role also involves contributing to semantic frameworks and conceptual research.

What you'd actually do

  1. Design and implementation of production-grade full stack applications that seamlessly integrate LLM and AI capabilities into scientific workflows, enabling researchers to leverage cutting-edge artificial intelligence in their daily work
  2. Direct collaboration with medicinal chemists, biomedical researchers, and domain experts to deeply understand requirements, translate scientific challenges into technical solutions, and deliver intuitive, user-centric applications
  3. Development of scalable backend services using Python frameworks for data processing, embedding generation, vector search, and LLM orchestration that power AI-driven research tools
  4. Creation of responsive, modern frontend interfaces using React and TypeScript that provide exceptional user experiences and dramatically enhance researcher productivity
  5. Implementation of retrieval-augmented generation (RAG) systems, conversational AI interfaces, and agentic LLM architectures that automate knowledge work in pharmaceutical research

Skills

Required

  • Python
  • TypeScript
  • React
  • FastAPI
  • AWS
  • LLM frameworks and libraries
  • vector databases
  • semantic search technologies

Nice to have

  • life sciences
  • pharmaceutical research
  • drug discovery
  • cheminformatics
  • prompt engineering
  • LLM optimization techniques
  • conversational AI interfaces
  • chatbots
  • agentic systems
  • MongoDB
  • PostgreSQL
  • deep learning models
  • natural language processing
  • computer vision
  • PyTorch
  • EC2
  • S3
  • CloudFormation
  • RDS
  • DevOps best practices
  • Docker
  • CI/CD pipelines
  • automated testing
  • deployment automation tools
  • GitHub Actions
  • Jenkins
  • GitLab CI

What the JD emphasized

  • production-grade full stack applications
  • production-quality software
  • GitHub portfolio required

Other signals

  • design and develop and deploy intelligent systems
  • integrate LLM and AI capabilities into scientific workflows
  • development of scalable backend services using Python frameworks for data processing, embedding generation, vector search, and LLM orchestration
  • Implementation of retrieval-augmented generation (RAG) systems, conversational AI interfaces, and agentic LLM architectures