Data Architect, Data Foundry

Eli Lilly Eli Lilly · Pharma · San Francisco, CA +4

Seeking Data Architects to design and build the data infrastructure for AI-native drug discovery. This involves creating schemas, ontologies, data models, and platform architectures to transform raw scientific data into machine-actionable assets for discovery scientists and autonomous AI agents. The role focuses on data modeling, data platform architecture, and knowledge graph systems, enabling AI-driven capabilities.

What you'd actually do

  1. Design and implement data models, schemas, and ontologies for chemical, biological, and automation-generated data that serve discovery workflows across the portfolio.
  2. Design and implement data lakehouse architecture using modern platforms (Databricks, Snowflake, or equivalent), including data storage patterns, partitioning strategies, and query optimization.
  3. Design and implement knowledge graphs (Neo4j, Amazon Neptune, TigerGraph) that capture molecular, target, pathway, and experimental relationships across the discovery landscape.
  4. Partner with scientific software engineers to ensure data architectures are implementable, performant, and well-documented.
  5. Collaborate with Methods4Insight to design data structures that support analytical model training, deployment, and evaluation.

Skills

Required

  • SQL
  • data modeling
  • data architecture
  • data engineering
  • scientific informatics
  • database paradigms (relational, graph, document, columnar, key-value)

Nice to have

  • Databricks
  • Snowflake
  • Spark
  • dbt
  • Neo4j
  • Amazon Neptune
  • TigerGraph
  • MongoDB
  • TileDB
  • Kafka
  • Kinesis
  • RDF
  • OWL
  • SPARQL
  • cloud platforms (AWS, Azure, or GCP)
  • modern data integration patterns
  • genomics/imaging data
  • LLM
  • RAG

What the JD emphasized

  • AI-native drug discovery
  • autonomous AI agents
  • data infrastructure
  • machine-actionable
  • FAIR-compliant
  • insight-ready assets
  • data models
  • platform architectures
  • knowledge graphs
  • vector databases
  • RAG workflows

Other signals

  • AI-native drug discovery
  • autonomous AI agents
  • data infrastructure for AI
  • ML and RAG workflows