Staff, Data Scientist

Walmart · Retail · Bentonville, AR

Staff Data Scientist role focused on developing, deploying, and scaling AI/ML solutions, particularly involving LLMs, VLMs, RAG, and vector databases, within Walmart's Global Tech International Data Science team. The role involves leading data science projects, building and training models, and influencing business decisions through data-derived insights and advanced analytical techniques.

What you'd actually do

  1. Drive data-derived insights across a wide range of retail divisions by developing advanced statistical models, machine learning algorithms and computational algorithms based on business initiatives
  2. Direct the gathering of data, assess data validity and synthesize data into large analytics datasets to support project goals
  3. Lead a team of Data Scientists to solve problems in Global Sourcing domain using advanced Machine Learning and Gen AI.
  4. Utilize big data analytics and advanced data science techniques to identify trends, patterns, and discrepancies in data.
  5. Build and train AI/ML models for replication for future projects

Skills

Required

  • Bachelor's with > 10 years /Masters > 8 years /Ph.D. in Comp Science/Statistics/Mathematics/Quantitative discipline with > 5 years of relevant experience
  • Ability to lead data science projects end to end
  • Strong experience in machine learning, supervised and unsupervised: NLP, Classification, Data/Text Mining, Multi-modal supervised and unsupervised models, Neural Networks, Deep Learning Algorithms, Generative AI models
  • Experience in analyzing complex problems and translating them into analytical solutions
  • Experience in machine learning: Classification models, regression models, NLP, Forecasting, Unsupervised models, Optimization, Recommendation models, deep-learning, Graph ML, Causal inference, Causal ML, Statistical Learning, experimentation
  • Ability to utilize data science solutions outcomes to drive predictive and prescriptive analytics
  • Experience with big data analytics - identifying trends, patterns, and outliers in large volumes of data
  • Ability to scale and deploy data science solutions
  • Experience in LLMs, VLMs, embedding generation from multimodal data, storage and retrieval from Vector Databases, set-up and provisioning of managed LLM gateways, development of Retrieval augmented generation based LLM agents, model selection, iterative prompt engineering and finetuning based on accuracy and user-feedback, monitoring and governance
  • Strong Experience in Python, PySpark, OpenCV, Python, C++, Keras, tensorflow, pytorch, big data platforms like Hadoop Google Cloud platform, Vertex AI, Kubeflow, model deployment, Kafka streaming, API development & deployment, CI/CD, MLOPs

Nice to have

  • Consult with business stakeholders regarding algorithm-based recommendations
  • Guides data scientists and senior data scientists across the domain DS team to ensure on-time delivery of ML products
  • Lead multiple complex ML products and guide senior tech leads in the domain in efficiently leading their products
  • Proactive identification of complex business problems that can be solved using advanced ML and Generative AI, finding opportunities and gaps in the current business domain
  • Evaluates proposed business cases for projects and initiatives
  • Translates business requirements into strategies, initiatives, and projects and aligns them to business strategy and objectives, and drives the execution of deliverables
  • Sets relevant deliverables based on the established success criteria and define key metrics to measure progress and effectiveness of the solution
  • Quantifies business impact and ensures regular impact measurement of all ML products in the domain
  • Identifies and reviews model evaluation metrics based on analytical requirements
  • Ensures testing information is documented and maintained by the team
  • Play a key role to solve complex problems, pivotal to Walmart's business and drive actionable insights
  • Utilize product mindset to build, scale and deploy holistic data science products after successful prototyping
  • Demonstrate incremental solution approach with agile and flexible ability to overcome practical problems
  • Articulate and present recommendations to business partners and influence plans based on insights
  • Partner and engage with associates in other regions for delivering the best services to customers around the globe
  • Work with the customer-centric mindset to deliver high-quality business-driven analytic solutions
  • Drive innovation in approach, method, practices, process, outcome, delivery, or any component of end-to-end problem solving
  • Proactively engages in the external community to build Walmart's brand and learn more about industry practices
  • Promote and support company policies, procedures, mission, values, and standards of ethics and integrity

What the JD emphasized

  • advanced statistical models
  • machine learning algorithms
  • computational algorithms
  • advanced Machine Learning
  • Gen AI
  • big data analytics
  • advanced data science techniques
  • AI/ML models
  • data science solutions
  • advanced ML
  • Generative AI
  • ML products
  • LLMs
  • VLMs
  • embedding generation from multimodal data
  • storage and retrieval from Vector Databases
  • managed LLM gateways
  • Retrieval augmented generation based LLM agents
  • model selection
  • iterative prompt engineering
  • finetuning
  • monitoring and governance
  • Python
  • PySpark
  • OpenCV
  • Keras
  • tensorflow
  • pytorch
  • big data platforms
  • Google Cloud platform
  • Vertex AI
  • Kubeflow
  • model deployment
  • Kafka streaming
  • API development & deployment
  • CI/CD
  • MLOPs

Other signals

  • Develop advanced statistical models, machine learning algorithms and computational algorithms
  • Build and train AI/ML models for replication for future projects
  • Deploy and maintain the data science solutions
  • Experience in LLMs, VLMs, embedding generation from multimodal data, storage and retrieval from Vector Databases, set-up and provisioning of managed LLM gateways, development of Retrieval augmented generation based LLM agents, model selection, iterative prompt engineering and finetuning based on accuracy and user-feedback, monitoring and governance.