Staff Engineer, Data Platform

Lila Sciences Lila Sciences · AI Frontier · Alewife, Cambridge, MA +1 · Software

Staff Engineer to set technical direction for core data infrastructure (ingestion, storage, orchestration, interfaces) supporting scientific discovery and ML research. Role involves designing and evolving data platform architecture, building reliable pipelines, ensuring observability, defining data models, and providing technical leadership and mentorship.

What you'd actually do

  1. Design and evolve the core data infrastructure that ingests, stores, and serves data across scientific and ML workflows. Make principled build-vs-buy decisions and establish architectural patterns adopted by the broader engineering organization.
  2. Build reliable pipelines that bring in data from diverse sources: laboratory instruments, public scientific datasets, and external research literature. Own the interfaces between upstream producers and downstream consumers.
  3. Operate and extend workflow orchestration systems that run complex, multi-step scientific pipelines. Ensure observability, fault tolerance, and reproducibility across the data stack.
  4. Define and maintain data models, schema evolution practices, and data contracts that ensure consistency, discoverability, and long-term durability of scientific and platform data assets.
  5. Partner with ML researchers, lab scientists, and product engineers to translate scientific and research requirements into platform capabilities. Drive alignment on data standards and integration patterns across teams.

Skills

Required

  • Python
  • SQL
  • production-quality code
  • relational and NoSQL databases
  • schema design
  • query optimization
  • cloud infrastructure
  • containerized deployment (AWS, Kubernetes)
  • modern table formats and open lakehouse patterns (Iceberg, Delta Lake, Hudi)

Nice to have

  • workflow orchestration systems (Flyte, Airflow, Dagster, or similar)
  • data infrastructure that serves agentic and LLM-driven workflows
  • vector databases
  • RAG infrastructure
  • retrieval-optimized data access patterns
  • scientific computing
  • life sciences
  • research software
  • AI-assisted development tools (Cursor, Claude Code, or similar)

What the JD emphasized

  • Designed and shipped data platform components from the ground up
  • Proven track record of working cross-functionally with scientists, ML researchers, and engineers

Other signals

  • data platform
  • infrastructure
  • ML researchers
  • scientific discovery