(usa) Senior, Data Engineer - Full Stack

Walmart Walmart · Retail · Sunnyvale, CA

Senior Data Engineer role focused on building scalable data solutions to support AI/ML initiatives, specifically for agentic AI systems and LLM-based applications within Sam's Club. The role involves designing data pipelines, ensuring data discoverability and trustworthiness, and contributing to a data-first, agent-aware architecture. Emphasis on data modeling, data governance for agents, and enabling seamless integration between data and downstream AI agents for autonomous decision-making and recommendations.

What you'd actually do

  1. Design, build, test, and deploy scalable, intelligent data solutions that support millions of Sam’s Club customers—while laying the groundwork for Agentic AI systems that consume and act on this data.
  2. Partner with engineering, AI/ML, and product teams to ensure data services are discoverable, trustworthy, and consumable by next-gen AI agents and LLM-based systems.
  3. Collaborate across Sam’s Club engineering teams to contribute to a data-first, agent-aware architecture and foster a culture of innovation around intelligent automation.
  4. Engage with Product Management and Business teams to prioritize data products that will drive autonomous decision-making and context-rich recommendations.
  5. Build features that enable seamless integration between structured/unstructured data and downstream AI agents, enabling smarter responses, faster insights, and automation.

Skills

Required

  • 4–6 years of experience in Big Data development with a focus on scalable, fault-tolerant architectures.
  • 2–3 years of hands-on experience with cloud platforms such as GCP or Azure, including services like BigQuery, Dataflow, Pub/Sub, or equivalent.
  • Strong foundation in data engineering best practices and experience building complex data pipelines optimized for agent consumption.
  • Experience designing and implementing semantic layers or knowledge graphs that could power data-aware AI agents.
  • Proven experience in data modeling and architecture, with awareness of how data structures affect retrieval quality and contextual relevance for agents.
  • Understanding of data governance, including quality, access control, and lineage—especially in the context of agent auditability and trust.
  • Experience writing clean, testable code in Python, Java, or Scala; experience with PySpark/Spark for distributed data processing.
  • Demonstrated ability to write optimized, scalable SQL and to work with large datasets across cloud-native and open-source platforms.
  • Hands-on experience with Kafka, Docker/Kubernetes, and cloud platforms such as Azure, and GCP.
  • A strong understanding of DevOps principles, CI/CD pipelines, system observability, and deployment patterns for applications with both frontend and backend components.
  • A demonstrated ability to design and maintain scalable, reliable, high-performing Full Stack applications.
  • Success building enterprise-level systems that include both user-facing interfaces and backend services.
  • Strong experience with API design, performance optimization, and scalable solution development across the full stack.

Nice to have

  • Exposure to LLM-driven workflows, prompt templating, or orchestration tools (e.g., LangChain, LlamaIndex, CrewAI) is beneficial but not required.
  • Familiarity with tools like Kafka, Spark Streaming, Druid, and Presto, and how they interact in real-time or hybrid data systems.
  • Advanced experience with Java and Spring Boot, along with strong experience using modern frontend frameworks such as React, Angular, or Vue.js.
  • Experience working with SQL (Azure SQL) and NoSQL (Cosmos, Cassandra, MongoDB) databases in both backend services and Full Stack systems.

What the JD emphasized

  • Agentic AI systems
  • data-aware AI agents
  • agent auditability
  • autonomous decision-making
  • autonomous workflows

Other signals

  • designing and building data solutions for AI agents
  • enabling integration between data and AI agents
  • data governance for agent auditability