Machine Learning Engineer III

Chewy Chewy · Retail · Boston, MA

This Machine Learning Engineer III role focuses on designing and implementing cloud architectures for end-to-end machine learning workflows, including data preprocessing, model training, and deployment. The role involves utilizing Infrastructure as Code (IaC) tools, containerization, and orchestration for deploying and scaling ML applications. Experience with AWS tools like SageMaker, PyTorch, TensorFlow, Docker, Kubernetes, MLflow, and inference optimization is required.

What you'd actually do

  1. Design and implement cloud architectures tailored for end-to-end machine learning workflows (including data preprocessing, model training, and deployment), ensuring scalability, reliability, and performance.
  2. Utilize Infrastructure as Code (IaC) tools, such as Terraform or AWS CloudFormation, to automate the provisioning and management of cloud resources for machine learning.
  3. Implement and manage containerization solutions (e.g., Docker) and orchestration tools (e.g., Kubernetes) for deploying and scaling machine learning applications.

Skills

Required

  • Cloud architecture design
  • Machine learning workflows
  • Data preprocessing
  • Model training
  • Model deployment
  • Scalability
  • Reliability
  • Performance optimization
  • Infrastructure as Code (IaC)
  • Terraform
  • AWS CloudFormation
  • Containerization
  • Docker
  • Orchestration tools
  • Kubernetes
  • Amazon Web Services (AWS)
  • Redshift
  • Snowflake
  • SageMaker
  • R
  • PySpark
  • Spark
  • Scala
  • Java
  • PyTorch
  • TensorFlow
  • Jenkins
  • MLflow
  • Model Serialization
  • Inference Optimization
  • API development

What the JD emphasized

  • Master’s degree or foreign equivalent in Operations Research, Data Science, Computer Engineering, Computer Science, Statistics, Applied Mathematics, or related field., plus 5 years of experience in an Operations Research Analysts or a related position/occupation.
  • Amazon Web Services tools such as Redshift, Snowflake, SageMaker or other similar platforms.
  • R, PySpark, Spark, Scala, Java, PyTorch, TensorFlow, Docker.
  • Jenkins.
  • Terraform.
  • MLflow.
  • Model Serialization.
  • Inference Optimization.
  • API development.

Other signals

  • end-to-end machine learning workflows
  • model training
  • deployment
  • MLOps
  • cloud architectures