Co-op, Data Extraction

Lila Sciences Lila Sciences · AI Frontier · Alewife, Cambridge, MA · Physical Sciences AI

The role involves contributing to AI systems for knowledge extraction from scientific literature and patents. Responsibilities include fine-tuning and evaluating language/multimodal models, building data structuring pipelines, running extraction pipelines, analyzing results, and documenting findings. The goal is to ship work that integrates into production systems.

What you'd actually do

  1. Contribute to AI systems that extract and structure knowledge from scientific literature and patents, focused on a well-defined sub-problem
  2. Fine-tune and evaluate language, multimodal, or specialized models for data extraction, with mentor guidance
  3. Build and test pipelines that structure unstructured scientific data across text, tables, and visuals
  4. Run extraction pipelines, analyze results, and document findings clearly
  5. Share your work through a team presentation, write-up, or contribution to a publication or open-source project

Skills

Required

  • machine learning fundamentals
  • Python
  • NLP concepts
  • computer vision concepts

Nice to have

  • multimodal models
  • document understanding
  • messy, real-world datasets
  • scientific document parsing

What the JD emphasized

  • fine-tuning
  • evaluating extraction models
  • building pipelines
  • shipping work that flows into production systems

Other signals

  • fine-tuning
  • evaluating extraction models
  • building pipelines
  • shipping work that flows into production systems