Lead Data Engineer, AI

Salesforce Salesforce · Enterprise · San Francisco, CA +3

Salesforce is seeking a Lead Data Engineer to build and scale the data infrastructure for their AI agents, focusing on retrieval systems and grounding agents in accurate, real-time data. The role involves architecting the semantic layer, embedding pipelines, and ensuring data quality for agentic AI.

What you'd actually do

  1. Design Search & Retrieval Systems — Build robust search indices that enable AI agents to perform complex, high-precision retrievals across the Salesforce data ecosystem.
  2. Architect the Agentic Retrieval Layer — Serve as primary architect for the semantic layer and embedding pipelines that ground Agentic AI in Customer Success data.
  3. Build Inference Infrastructure — Partner with Decision Scientists to develop specialized infrastructure for attribution and causal modeling.
  4. Drive Operational Excellence — Set and enforce rigorous standards for data quality, latency, and index freshness so agents deliver reliable, real-time insights.
  5. Lead AI Integration & Automation — Automate the data delivery pipeline, ensuring seamless integration across internal databases, third-party APIs, and the AI orchestration layer.

Skills

Required

  • Python
  • SQL
  • Spark
  • ETL/ELT tools (Airflow, dbt, Informatica)
  • Data modeling
  • Database concepts
  • Data warehousing (SQL and NoSQL)
  • Cloud data platforms (AWS, Azure, or Google Cloud)
  • Salesforce ecosystem, including Data Cloud
  • Technical degree

Nice to have

  • Semantic search indices
  • Embedding pipelines
  • Retrieval-augmented generation (RAG) systems
  • Vector databases
  • AI/ML infrastructure
  • Supporting Decision Science or Data Science teams on causal or attribution modeling
  • Mentoring engineers
  • Driving technical standards

What the JD emphasized

  • primary architect
  • retrieval systems
  • agentic retrieval layer
  • embedding pipelines
  • data delivery pipeline

Other signals

  • AI agents
  • retrieval systems
  • semantic layer
  • embedding pipelines
  • data delivery pipeline