Senior, Data Scientist

Walmart Walmart · Retail · Bangalore, KA, India

Senior Data Scientist at Walmart Global Tech focused on applying ML and advanced analytics to retail business problems. The role involves developing statistical models, ML algorithms, and computational algorithms, working with big data, building and training models, and communicating insights. Key responsibilities include leading teams, partnering with business stakeholders, and driving innovation. Specific experience is required in areas like embedding generation, vector databases, LLM gateways, RAG agents, prompt engineering, and fine-tuning, utilizing Python, PySpark, and Google Cloud Platform (Vertex AI, Kubeflow).

What you'd actually do

  1. Drive data-derived insights across the wide range of retail divisions by developing advanced statistical models, machine learning algorithms and computational algorithms based on business initiatives
  2. Direct the gathering of data, assessing data validity and synthesizing data into large analytics datasets to support project goals
  3. Utilize big data analytics and advanced data science techniques to identify trends, patterns, and discrepancies in data. Determine additional data needed to support insights
  4. Build and train statistical models and machine learning algorithms for replication for future projects
  5. Communicate recommendations to business partners and influencing future plans based on insights

Skills

Required

  • Bachelor's with > 7 years of relevant experience OR Masters with > 5 years of relevant experience OR PHD in Comp Science/Statistics/Mathematics with > 3 years of relevant experience
  • Experience in Analyzing the Complex Problems and translate it into data science algorithms
  • Experience in machine learning, supervised and unsupervised: NLP, Classification, Data/Text Mining, Multi-modal supervised and unsupervised models, Neural Networks, Deep Learning Algorithms
  • Experience in statistical learning: Predictive & Prescriptive Analytics, Web Analytics, Parametric and Non-parametric models, Regression, Time Series, Dynamic/Causal Model, Statistical Learning, Guided Decisions, Topic Modeling
  • Experience with big data analytics - identifying trends, patterns, and outliers in large volumes of data
  • Embedding generation from training materials, storage and retrieval from Vector Databases, set-up and provisioning of managed LLM gateways, development of Retrieval augmented generation based LLM agents, model selection, iterative prompt engineering and finetuning based on accuracy and user-feedback, monitoring and governance.
  • Strong Experience in Python, PySpark
  • Google Cloud platform, Vertex AI, Kubeflow, model deployment
  • Strong Experience with big data platforms – Hadoop (Hive, Map Reduce, Scala)

Nice to have

  • Domain Knowledge of one or more divisions in Retail, preferably in operate domain and Merchandising
  • Published papers or given talks in leading academic and research journals
  • Published papers or given talks in Data Science Forums
  • Hold data science related patents
  • Experience with big data platforms - Hadoop (Hive, Pig, Map Reduce, HQL) / Spark
  • Experience in deep learning and worked in TensorFlow and Torch
  • Experience with GPU/CUDA for computational efficiency

What the JD emphasized

  • Embedding generation from training materials, storage and retrieval from Vector Databases, set-up and provisioning of managed LLM gateways, development of Retrieval augmented generation based LLM agents, model selection, iterative prompt engineering and finetuning based on accuracy and user-feedback, monitoring and governance.

Other signals

  • deploy Machine Learning algorithms
  • build reference architectures and machine learning pipelines
  • productize our solutions
  • build, scale and deploy holistic data science products
  • Embedding generation from training materials, storage and retrieval from Vector Databases, set-up and provisioning of managed LLM gateways, development of Retrieval augmented generation based LLM agents, model selection, iterative prompt engineering and finetuning based on accuracy and user-feedback, monitoring and governance.