(usa) Staff, Data Scientist

Walmart · Retail · Sunnyvale, CA

Staff Data Scientist role at Walmart focused on building scalable end-to-end data science solutions for Walmart Marketplace. Responsibilities include owning the MLOps lifecycle, developing scalable MLOps infrastructure, applying SRE principles to ML workloads, leading code quality, providing architectural guidance, building training and inference pipelines, enhancing data feedback workflows, and deploying end-to-end models. Requires strong ML foundations, Python, big data platforms, and SQL.

What you'd actually do

  1. Own the MLOps lifecycle, from data monitoring to refactoring data science code to building robust model monitoring workflows for model lifecycle management
  2. Can develop and maintain scalable MLOps infrastructure, such as building Kafka pipeline for unified logging and monitoring across multiple projects.
  3. Apply SRE (Site Reliability Engineering) principles for ML workloads, optimizing cloud deployments, monitoring performance and troubleshooting application latency or operational issues.
  4. Lead code quality and engineering best practices by conducing code reviews, and mentoring junior MLEs.
  5. Build continuous, distributed and scalable training pipeline, inference pipeline and performance monitoring pipeline to be used across several Marketplace initiatives.

Skills

Required

  • Foundations of machine learning and statistics
  • Machine learning, supervised and unsupervised and deep learning
  • Analyzing complex problems and translate it into data science algorithms
  • Big data analytics
  • Python
  • Data Structures
  • Big data platforms – Hadoop (Hive, Pig, Map Reduce, Scala, Spark)
  • Git
  • SQL
  • relational databases
  • data warehouse

What the JD emphasized

  • scalable end-to-end data science solutions
  • scalable MLOps infrastructure
  • scalable, robust technical solutions
  • continuous, distributed and scalable training pipeline, inference pipeline and performance monitoring pipeline
  • scalable

Other signals

  • MLOps lifecycle
  • scalable MLOps infrastructure
  • SRE principles for ML workloads
  • continuous, distributed and scalable training pipeline, inference pipeline and performance monitoring pipeline
  • Build and deploy end-to-end models