Principal, Data Engineer

Walmart Walmart · Retail · Sunnyvale, CA

Principal Data Engineer responsible for developing and contributing to high-impact Global Marketplace's enterprise data platform deliverables at a global scale. The role operates at the intersection of retail scale, near real-time intelligence, and Agentic AI, translating platform strategy into business impact. Key responsibilities include transforming and evolving Walmart’s core data platform using Agentic AI, developing data platforms that scale to billions of events per day, and creating agent-ready data optimized for LLMs and multi-agent workflows. The role also involves influencing teams, leading modernization of data lakes and pipelines, and mentoring engineers.

What you'd actually do

  1. Transform and evolve Walmart’s core data platform using Agentic AI across batch, streaming, and hybrid systems to support data powered Global Marketplace applications, analytics, operational intelligence, and AI-native workloads.
  2. Develop data platforms that scale to billions of events per day, supporting both near real-time retail decisions and deep analytical insights.
  3. Develop agent-ready data, ensuring data products are discoverable, semantically rich, and optimized for LLMs, copilots, and multi-agent workflows.
  4. Influence multiple teams and organizations through technical leadership and clear architectural direction—without direct authority.
  5. Develop fault-tolerant, resilient systems that support mission-critical retail and marketplace operations.

Skills

Required

  • 10–15+ years building and evolving large-scale distributed data platforms
  • Proven experience designing systems that operate at global retail or consumer-scale
  • Strong experience with cloud-native ecosystems (GCP or Azure preferred), including BigQuery, Serverless, Pub/Sub, or equivalent
  • Expertise in batch and streaming technologies (Kafka, Spark Structured Streaming, Flink, Druid, etc.)
  • Experience working with hybrid architectures that support both real-time operations and analytical workloads
  • Strong understanding of semantic modeling, embeddings, knowledge graphs, and vector indexing
  • Experience supporting RAG, context-aware AI, and agent orchestration through data platform design
  • Ability to reason about schema design, latency, storage formats, and their impact on AI behavior and outcomes
  • Fluency in Python, Java, or Scala
  • Deep experience with Spark/PySpark and large-scale SQL optimization
  • Strong systems thinking, performance tuning, and operational excellence mindset
  • Demonstrated ability to lead through influence in complex, matrixed organizations
  • Executive-level communication skills, with the ability to connect technical strategy to business and customer impact

Nice to have

  • GCP or Azure preferred

What the JD emphasized

  • Agentic AI
  • billions of events per day
  • agent-ready data
  • LLMs, copilots, and multi-agent workflows
  • mission-critical retail and marketplace operations
  • global retail or consumer-scale
  • RAG, context-aware AI, and agent orchestration
  • AI behavior and outcomes

Other signals

  • enterprise data platform
  • Agentic AI
  • billions of events per day
  • agent-ready data
  • LLMs, copilots, and multi-agent workflows
  • mission-critical retail and marketplace operations
  • composable, event-driven, and agent-aware platforms
  • global retail or consumer-scale
  • RAG, context-aware AI, and agent orchestration
  • AI behavior and outcomes