Machine Learning Platform - Lead Engineer

Allstate Allstate · Insurance · IL, USA, United States · Remote

Lead Engineer for an enterprise ML platform, focusing on architecting, building, and scaling core services like training infrastructure, feature stores, model registries, and inference runtimes. The role involves driving MLOps automation, cloud-native engineering on Azure, AWS, or GCP, and enabling reliable, scalable, and responsible ML adoption across the company.

What you'd actually do

  1. Serve as the technical lead for ML platform architecture, guiding system design, scalability, performance, and reliability across platform components.
  2. Architect and build core ML platform services, including training and compute infrastructure, feature stores, model registries, inference runtimes, and data pipelines.
  3. Drive architectural decisions for distributed systems, cloud‑native frameworks, and automated MLOps workflows that support enterprise-scale machine learning.
  4. Evaluate and integrate emerging ML platform technologies, tools, and best practices to continuously strengthen platform capabilities.
  5. Design and implement robust MLOps pipelines for experiment tracking, data and model versioning, CI/CD for ML, automated retraining, and model governance.

Skills

Required

  • Python
  • backend development
  • MLOps tools
  • ML frameworks
  • model deployment techniques
  • ML lifecycle automation
  • cloud platforms
  • Azure ML Studio
  • AWS SageMaker
  • Google Vertex AI
  • Kubernetes
  • Docker
  • CI/CD
  • Terraform
  • Infrastructure-as-Code
  • system design
  • APIs
  • data pipelines
  • scalable ML infrastructure patterns

Nice to have

  • Azure Fabric/OneLake
  • AWS S3
  • Google Cloud Storage (GCS)
  • Azure Defender
  • Microsoft Purview
  • AWS Security Hub
  • Amazon Inspector
  • GCP Security Command Center

What the JD emphasized

  • extensive experience in ML engineering, platform engineering, or large-scale distributed systems
  • deep hands-on expertise with MLOps tools, ML frameworks, model deployment techniques, and ML lifecycle automation
  • strong proficiency in Python and backend development for machine learning systems
  • experience with cloud platforms and ML services, including Azure ML Studio, AWS SageMaker, and/or Google Vertex AI
  • solid knowledge of system design, APIs, data pipelines, and scalable ML infrastructure patterns
  • proven ability to lead technical initiatives and influence cross‑team engineering decisions

Other signals

  • ML platform
  • MLOps
  • model deployment
  • enterprise-scale ML