Data Engineer II

Samsara Samsara · Enterprise · CA · Remote · Revenue Operations

Data Engineer II role focused on building and optimizing data platforms for Samsara's GTM AI engine. Responsibilities include managing Databricks data stores, enabling generative AI jobs, and ensuring clean data for AI applications. The role involves building ETL/ELT pipelines, developing data models, implementing data quality monitoring, and supporting RAG/vector database integrations.

What you'd actually do

  1. Build and maintain ETL/ELT data pipelines in Databricks and Spark, ensuring data is ingested, transformed, and delivered reliably for analytics and AI use cases.
  2. Develop and evolve logical and physical data models to support reporting, experimentation, and advanced workflows (e.g., scoring models, signal generation).
  3. Implement monitoring, alerts, and testing for data quality, timeliness, and lineage to ensure trustworthy data delivery.
  4. Support workflow orchestration with Databricks Jobs, DBT, or equivalent scheduling tools to operate at scale.
  5. Contribute to data pipelines and tooling that support retrieval-augmented generation (RAG), vector integrations, or embedding workflows.

Skills

Required

  • Data engineering
  • Databricks
  • DBT
  • Spark
  • Python
  • SQL
  • ETL/ELT pipeline design
  • Data modeling
  • Performance optimization
  • Cost-efficient infrastructure design
  • Workflow orchestration

Nice to have

  • Generative AI workflows
  • Vector databases
  • Embeddings
  • Retrieval systems
  • Salesforce
  • Gainsight
  • Gong
  • Outreach
  • Observability
  • Monitoring
  • Governance
  • AI/ML collaboration

What the JD emphasized

  • significant experience building large-scale data platforms
  • Experience orchestrating data workflows at scale and enabling machine learning or AI use cases
  • Experience enabling generative AI workflows in Databricks or similar platforms
  • Familiarity with vector databases, embeddings, and retrieval systems
  • Exposure to observability, monitoring, and governance best practices for data and AI systems

Other signals

  • Data pipelines for AI
  • Databricks
  • Generative AI
  • RAG
  • Vector databases