Senior Manager, Data Science

Walmart Walmart · Retail · Bentonville, AR

Senior Manager, Data Science role focused on the end-to-end lifecycle of machine learning models, from problem formulation and data analysis to model development, deployment, and monitoring in a production environment. The role involves leveraging advanced ML techniques, Python, PySpark, and cloud platforms (Azure Databrates, Google Cloud) to build and scale data science solutions.

What you'd actually do

  1. Tech. Problem Formulation: To analyze the business problem within one's discipline and questions assumptions to help the business identify the root cause.
  2. Analytical Modeling: Select and develop variables and features iteratively based on model responses in collaboration with the business.
  3. Model Deployment and Scaling: To deploy models to production.
  4. Code Development and Testing: To write code to develop the required solution and application features by determining the appropriate programming language and leveraging business, technical, and data requirements.
  5. Data Visualization: To generate appropriate graphical representations of data and model outcomes.

Skills

Required

  • Designing and implementing scalable machine learning models incorporating data pipelines, model training, inferencing, versioning and monitoring
  • Coding in an object-oriented programming language Python
  • Writing unit tests using pytest and unittest for Python
  • Designing and implementing APIS in Python using Fastapi
  • Developing, optimizing and deploying machine learning models using Pyspark and Azure Databricks
  • Extracting and preprocessing large datasets from Google Cloud Storage and Google Big Query for machine learning models
  • Implementing CI/CD pipelines using Jenkins and using Git for Source Control Management
  • Implementing platform for tracking data science experiments, managing models, and managing model versions using MLflow
  • Developing web applications using Python with Plotly Dash
  • Developing event-driven systems using Google Pub/Sub for message queuing and real time communication between services
  • Utilizing caching strategies to store high-volume static API response onto disk stores and Redis cache to improve the performance

What the JD emphasized

  • Designing and implementing scalable machine learning models incorporating data pipelines, model training, inferencing, versioning and monitoring
  • Developing, optimizing and deploying machine learning models using Pyspark and Azure Databricks
  • Extracting and preprocessing large datasets from Google Cloud Storage and Google Big Query for machine learning models
  • Implementing CI/CD pipelines using Jenkins and using Git for Source Control Management
  • Implementing platform for tracking data science experiments, managing models, and managing model versions using MLflow

Other signals

  • deploy models to production
  • model deployment and scaling
  • advanced machine learning algorithms
  • train algorithms to apply models to new data sets
  • model assessment and validation
  • model deployment and scaling
  • continuously log and track model behavior once it is deployed