Software Engineer, Data Foundations

Glean Glean · Enterprise · Mountain View, CA · Engineering

Software Engineer for Data Foundations team responsible for the end-to-end data ingestion and management layer powering Glean's Search, AI Assistant, and Agent products. This involves building and scaling connectors, handling data transformations, and ensuring the quality, freshness, and trustworthiness of data used by AI systems, with a focus on preserving access controls and sensitivity constraints for LLM reasoning.

What you'd actually do

  1. Build and scale connectors to a wide variety of SaaS and on-prem systems (Google Workspace, Microsoft 365, Slack, Salesforce, Jira, ServiceNow, GitHub, etc.).
  2. Transform raw, unstructured enterprise content into rich, structured, permission-aware representations optimized for search and LLM reasoning.
  3. Own end-to-end correctness, freshness, and performance for petabyte-scale data flows.
  4. Preserve fine-grained ACLs, deletions, and sensitivity constraints so AI answers are always grounded in what users are actually allowed to see.
  5. Partner closely with Search Serving, Product, Platforms, and Security teams to define how enterprise context is exposed to LLMs and agents.

Skills

Required

  • 3+ years building production backend or data infrastructure systems (Java, Go, C++, Python, etc.)
  • Hands-on experience with distributed systems, data pipelines, queues, and large-scale storage (SQL/NoSQL)
  • Experience with SLOs, error budgets, failure modes, and correctness guarantees
  • Experience with strict consistency and permission-modeling challenges

Nice to have

  • Prior work on enterprise connectors
  • Prior work on search/indexing
  • Prior work on information retrieval
  • Prior work on security-sensitive systems
  • Power user of LLMs and AI tools in your own workflow

What the JD emphasized

  • end-to-end data ingestion and management layer
  • quality, freshness, and trustworthiness of the knowledge
  • permission-aware representations optimized for search and LLM reasoning
  • petabyte-scale data flows
  • fine-grained ACLs, deletions, and sensitivity constraints
  • enterprise trust

Other signals

  • building and scaling connectors
  • transforming unstructured content into structured representations for LLM reasoning
  • owning end-to-end correctness for petabyte-scale data flows
  • preserving fine-grained ACLs and sensitivity constraints for AI answers