Distinguished, Software Engineer -ai/ml Engineer- Walmart Connect

Walmart · Retail · Sunnyvale, CA

Distinguished Software Engineer - AI/ML Engineer for Walmart Connect, focusing on architecting and deploying advanced Gen AI and ML solutions for the advertising platform. The role involves owning the end-to-end strategy for LLMs, multimodal models, RAG, and autonomous agents, with a strong emphasis on scaling these solutions for enterprise-level retail media. Key responsibilities include shaping the AI engineering roadmap, leading model pre-training/fine-tuning/alignment, architecting RAG pipelines and autonomous agents, optimizing inference for ultra-low latency, and establishing frameworks for safety and compliance. The role requires deep expertise in LLM orchestration, multi-agent architectures, and integrating AI into core ad systems, while also staying at the forefront of research and translating it into scalable products.

What you'd actually do

  1. Shape the long-term AI engineering roadmap for Walmart Ads, aligning with the org vision of becoming the definitive AI-native omnichannel RMN.
  2. Lead the strategy for pre-training, fine-tuning, and aligning (RLHF/DPO) open-weight foundation models explicitly for the retail media domain
  3. Architect highly scalable Retrieval-Augmented Generation pipelines that ground LLM outputs in real-time inventory, pricing, and hyper-local store data.
  4. Architect and deploy autonomous AI agents capable of complex reasoning, tool-use, and multi-step planning. Use cases include autonomous campaign optimization, real-time budget reallocation based on live market signals, and dynamic audience discovery.
  5. Drive inference optimization (quantization, vLLM, TensorRT) to ensure generative models are cost-efficient and meet the strict ultra-low latency requirements (sub-100ms) of programmatic ad exchanges and real-time bidding (RTB) environments.

Skills

Required

  • Python
  • PyTorch
  • JAX
  • TensorFlow
  • LLM orchestration frameworks (e.g., LangChain, LlamaIndex)
  • reasoning frameworks (Chain-of-Thought, ReAct)
  • multi-agent architectures
  • highly available, low-latency distributed systems
  • cloud-native architectures (GCP, Azure, or AWS)
  • scalable vector databases (Milvus, Pinecone)
  • feature stores
  • real-time streaming pipelines (Kafka, Flink)
  • programmatic advertising
  • Real-Time Bidding (RTB)
  • click-through rate (CTR) prediction
  • dynamic creative optimization

Nice to have

  • Go
  • Rust
  • C++
  • vLLM
  • TensorRT
  • advertising technology
  • retail media
  • e-commerce
  • PhD or Master’s degree in Computer Science, Machine Learning, Statistics, or related field
  • Patents or publications in top-tier AI conferences (NeurIPS, ICML)

What the JD emphasized

  • own the end-to-end engineering & ML strategy and delivery of custom LLMs, multimodal generative models, RAG pipelines, and autonomous multi-agent systems
  • define the technical vision for agentic retail media at enterprise scale
  • autonomous, self-optimizing advertising experiences
  • pre-training, fine-tuning, and aligning (RLHF/DPO) open-weight foundation models
  • autonomous AI agents capable of complex reasoning, tool-use, and multi-step planning
  • ultra-low latency requirements (sub-100ms)
  • handling millions of queries per second (QPS)
  • compliance with data privacy regulations
  • translating complex AI research into clear, actionable product roadmaps
  • Patents or publications in top-tier AI conferences (NeurIPS, ICML) are a strong plus

Other signals

  • architecting, developing, and deploying advanced Gen AI and ML solutions
  • own the end-to-end engineering & ML strategy and delivery of custom LLMs, multimodal generative models, RAG pipelines, and autonomous multi-agent systems
  • define the technical vision for agentic retail media at enterprise scale
  • autonomous, self-optimizing advertising experiences
  • pre-training, fine-tuning, and aligning (RLHF/DPO) open-weight foundation models
  • autonomous AI agents capable of complex reasoning, tool-use, and multi-step planning
  • Drive inference optimization (quantization, vLLM, TensorRT) to ensure generative models are cost-efficient and meet the strict ultra-low latency requirements (sub-100ms) of programmatic ad exchanges and real-time bidding (RTB) environments
  • Establish the frameworks for brand safety, bias mitigation, hallucination detection, and compliance with data privacy regulations
  • translating complex AI research into clear, actionable product roadmaps