Senior AI Data Engineer

Elastic Elastic · Enterprise · Spain · Sales Strategy & Operations

Senior AI Data Engineer responsible for building and maintaining the golden customer dataset, making it AI-ready for downstream AI workflows (account research, lead scoring, churn signals, CSM briefings). This involves designing canonical datasets, implementing enrichment pipelines, deduplication, entity resolution, validation systems, chunking, embedding strategy, metadata design, and source integration. The role also owns quality, lineage, monitoring, drift detection, and documentation for AI consumption. Requires experience with GTM data, preparing data for RAG, embeddings, and AI agents, and using LLMs for data tasks. Experience with Python, SQL, cloud infrastructure, orchestration, and Elastic Stack is necessary.

What you'd actually do

  1. Build and maintain the golden customer dataset. Design the canonical dataset that unifies signals from across GTM systems into a single, governed source of truth — including the enrichment pipelines, deduplication, entity resolution, and validation systems that keep it accurate as sources land and drift.
  2. Make data AI-ready. Work with the RevOps Data Science team to prepare structured and unstructured data for downstream AI workflows — account research, lead scoring, churn signals, CSM briefings — covering chunking, embedding strategy, metadata design, and source integration across GTM systems, product telemetry, and third-party enrichment providers.
  3. Own quality and lineage. Implement monitoring, drift detection, and lineage tracking so anomalies surface before they reach a forecast, a dashboard, or a seller's inbox.
  4. Set the standard. Define how RevTech prepares data for AI consumption and document the schemas, pipelines, and contracts downstream teams depend on.

Skills

Required

  • Python
  • senior-level SQL
  • cloud infrastructure (AWS, Azure, or GCP)
  • orchestration experience (Airflow, Dagster, or equivalent)
  • GTM data fluency
  • entity resolution
  • deduplication
  • AI-readiness experience
  • chunking strategies
  • metadata enrichment
  • embedding model selection
  • LLM applied to data
  • extraction
  • classification
  • normalization
  • evaluate whether they're working

Nice to have

  • Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field
  • Prior experience inside a RevOps, GTM Systems, or Marketing Operations engineering team
  • Working knowledge of Elasticsearch, vector search, and ESRE
  • genuine interest in building it

What the JD emphasized

  • 3+ years of experience building production pipelines that feed ML or LLM-based systems
  • prepared data for RAG, embeddings, and AI agents
  • used LLMs for extraction, classification, and normalization
  • know how to evaluate whether they're working

Other signals

  • data foundation
  • AI-ready dataset
  • customer data
  • Salesforce
  • enrichment pipelines
  • deduplication
  • entity resolution
  • validation systems
  • account scoring
  • segmentation
  • AI-powered workflows
  • PII governance
  • consent management
  • responsible AI practices
  • chunking
  • embedding strategy
  • metadata design
  • source integration
  • monitoring
  • drift detection
  • lineage tracking
  • schemas
  • pipelines
  • contracts
  • production pipelines
  • ML or LLM-based systems
  • GTM data
  • RAG
  • embeddings
  • AI agents
  • extraction
  • classification
  • normalization
  • evaluate whether they're working
  • Python
  • SQL
  • cloud infrastructure
  • orchestration
  • Elasticsearch
  • vector search
  • ESRE