Data Scientist II

Chewy Chewy · Retail · Richardson, TX

Data Scientist II role focused on developing and deploying machine learning infrastructure and data pipelines in production environments. Responsibilities include building, deploying, and scaling ETL pipelines and ML models, querying and transforming large datasets, developing and evaluating supervised and unsupervised models, and communicating insights to management. Requires a Master's degree and 3 years of experience with ML frameworks, cloud services (Lambda, Glue, S3, SageMaker), distributed computing (Snowflake, Spark), Python, Docker, and visualization tools.

What you'd actually do

  1. Develop machine learning infrastructure and data pipelines that improve data quality.
  2. Use machine learning frameworks in both development, testing, and production environments to create and deploy new technologies.
  3. Identify opportunities for data science to improve current products and practices for business engineering teams.
  4. Create machine learning algorithms to optimize and deliver results by reducing computational complexity, increasing the accuracy of models, and improving business metrics.
  5. Perform data ET, statistical and analytical analyses, and communicate insights and recommendations to Chewy management to make informed decisions.

Skills

Required

  • Master's degree in Science, Business Analytics, Industrial Engineering or related field
  • 3 years of experience as a Data Scientist
  • Building, deploying, and scaling ETL pipelines and machine learning models in production environments
  • Querying, transforming, and managing large-scale datasets using distributed computing frameworks and cloud data warehouses
  • Developing, training, and evaluating both supervised and unsupervised models
  • Python
  • Docker
  • PySpark
  • Tableau
  • Matplotlib
  • Seaborn

Nice to have

  • Lambda
  • Glue
  • S3
  • Athena
  • SageMaker
  • Snowflake
  • Spark
  • Scikit-learn
  • PyTorch
  • TensorFlow

What the JD emphasized

  • Building, deploying, and scaling ETL pipelines and machine learning models in production environments
  • Querying, transforming, and managing large-scale datasets using distributed computing frameworks and cloud data warehouses
  • Developing, training, and evaluating both supervised and unsupervised models for experimentation and business optimization

Other signals

  • Deploying and scaling ML models in production
  • Building ML algorithms to optimize and deliver results
  • Developing, training, and evaluating supervised and unsupervised models