(usa) Senior Manager, Data Engineering

Walmart Walmart · Retail · Sunnyvale, CA

Senior Engineering Manager for Agentic Data at Sam's Club, responsible for evolving the data ecosystem into a Semantic & Contextual Layer for AI Agents. The role involves architecting a unified semantic layer, owning real-time streaming architectures for agentic intelligence, building agent-ready data pipelines, leading a team, and innovating retrieval systems for RAG at scale. Emphasis on data engineering fundamentals, real-time processing, and establishing operational excellence for autonomous AI workflows.

What you'd actually do

  1. Define the technical vision for a unified semantic layer that transforms structured, semi-structured, and unstructured data into context-aware representations (Embeddings, Knowledge Graphs, and Metadata) consumable by LLMs and multi-agent workflows.
  2. Own the streaming architecture that feeds our agents. You will leverage Kafka to build low-latency event-driven pipelines, ensuring agents have access to "up-to-the-second" member context and environmental state.
  3. Oversee the design of scalable schemas, partitioning strategies, and high-performance indexing that make petabyte-scale data accessible to reasoning models.
  4. Lead the design of "Agentic Data" pipelines—ensuring that data provided to AI agents is high-fidelity, optimized for retrieval, and enriched with the necessary business logic to prevent hallucinations.
  5. Manage and mentor a high-performing team of engineers. You will bridge the gap between AI/ML Research, Product, and Platform Engineering to align data roadmaps with long-term autonomous decision-making goals.

Skills

Required

  • Data Engineering
  • Leadership
  • Data Modeling
  • Distributed Computing
  • Kafka
  • Event-driven architectures
  • Vector Databases
  • Knowledge Graphs
  • Framework orchestration (LangChain, LlamaIndex, CrewAI)
  • GCP or Azure
  • BigQuery
  • Dataflow
  • Pub/Sub
  • Python
  • Java
  • Scala

Nice to have

  • semantic layer
  • contextual layer
  • AI Agents
  • LLMs
  • multi-agent workflows
  • Embeddings
  • Knowledge Graphs
  • Metadata
  • streaming architecture
  • low-latency event-driven pipelines
  • member context
  • environmental state
  • scalable schemas
  • partitioning strategies
  • high-performance indexing
  • petabyte-scale data
  • reasoning models
  • Agentic Data pipelines
  • high-fidelity data
  • optimized for retrieval
  • enriched with business logic
  • prevent hallucinations
  • AI/ML Research
  • Product
  • Platform Engineering
  • autonomous decision-making goals
  • Retrieval-Augmented Generation (RAG)
  • batch and streaming sources
  • Agentic Observability
  • telemetry
  • lineage
  • auditability patterns
  • autonomous and regulated AI workflows
  • Data Vault
  • storage formats (Parquet, Avro)
  • sharding
  • replication
  • CAP theorem
  • Kafka Streams
  • Flink
  • Spark Streaming
  • Pinecone
  • Milvus
  • LangChain
  • LlamaIndex
  • CrewAI
  • Dataflow
  • Pub/Sub
  • context window management
  • AI reasoning
  • retrieval quality
  • testability
  • maintainability
  • high-performance system design

What the JD emphasized

  • architect the bridge between raw data and autonomous action
  • evolve our massive data ecosystem into a Semantic & Contextual Layer
  • build the "brain" of the retail experience
  • lead the transition from traditional data structures to agent-ready environments
  • Agentic Data

Other signals

  • architecting the bridge between raw data and autonomous action
  • evolve our massive data ecosystem into a Semantic & Contextual Layer
  • build the "brain" of the retail experience
  • lead the transition from traditional data structures to agent-ready environments