Staff, Software Engineer

Walmart · Retail · Sunnyvale, CA

Staff Software Engineer role focused on building and deploying large-scale ML systems, including deep learning models and real-time inference services, for personalization in a consumer domain. The role involves implementing advanced techniques like RAG and utilizing multimodal data to power recommendations and customer engagement, with a focus on agent frameworks and leveraging technologies like vector databases and knowledge graphs.

What you'd actually do

  1. Design, build, and deploy large-scale, production-grade ML systems, including deep learning models, real-time inference services, and end-to-end ML pipelines.
  2. Implement advanced techniques like Retrieval-Augmented Generation (RAG) to provide agents with a comprehensive understanding of long-term customer history, preferences, and intent, as well as short-term session context.
  3. Champion the use of multimodal data (text, images, user behavior, contextual signals) to power hyper-personalized recommendations, product discovery, and proactive customer engagement.
  4. Participate in medium- to large-scale, complex, cross-functional projects by reviewing project, product and business requirements; translating requirements into technical solutions; gathering requested information (for example, design documents, product requirement); designing robust and scalable architectures; writing and developing code; conducting unit testing; communicating status and issues to team members and stakeholders; collaborating with cross functional teams; troubleshooting open issues and bug-fixes; enhancing design to prevent re-occurrences of defects; ensuring on-time delivery.

Skills

Required

  • Python
  • PyTorch
  • TensorFlow
  • ML engineering principles
  • data pipelines
  • model training
  • serving infrastructure
  • vector databases
  • knowledge graphs
  • semantic search techniques
  • event driven architecture using Kafka
  • complex software design
  • distributed system design
  • design patterns
  • data structures
  • algorithms
  • Kubernetes
  • Docker
  • CI/CD tools
  • Jenkins
  • Maven
  • Cloud Technologies
  • Azure
  • GCP
  • monitoring production system
  • Grafana
  • Prometheus
  • software development life cycle
  • best practices
  • Agile Software Development

Nice to have

  • exploring and learning new technologies

What the JD emphasized

  • production-grade ML systems at scale
  • e-commerce, search, or recommendation systems
  • generative AI
  • LLMs
  • RAG systems
  • autonomous agent frameworks

Other signals

  • large-scale ML systems
  • real-time inference services
  • end-to-end ML pipelines
  • Retrieval-Augmented Generation (RAG)
  • multimodal data
  • generative AI
  • LLMs
  • autonomous agent frameworks
  • vector databases
  • knowledge graphs
  • semantic search