Machine Learning Engineer, ML Systems and Infrastructure

Autodesk Autodesk · Enterprise · Toronto, ON +10 · Remote

Machine Learning Engineer focused on building and operating ML systems at scale, including pipelines for data preparation, model training, evaluation, deployment, and monitoring. The role involves developing reliable software and infrastructure for scalable machine learning workflows, contributing to distributed data processing and training systems, and supporting data ingestion and transformation for large-scale datasets.

What you'd actually do

  1. Build and maintain components of ML pipelines for data preparation, model training, evaluation, deployment, and monitoring
  2. Develop reliable software and infrastructure that supports scalable machine learning workflows
  3. Contribute to distributed data processing and training systems used by researchers and engineering teams
  4. Support data ingestion, transformation, validation, and serving for large-scale structured and semi-structured technical datasets
  5. Improve automation, testing, CI/CD, observability, and operational reliability for ML systems

Skills

Required

  • Python
  • cloud platforms (AWS, Azure, GCP)
  • containers
  • version control
  • CI/CD
  • modern development workflows
  • data-intensive systems
  • backend systems
  • ML pipelines
  • software engineering fundamentals
  • coding
  • testing
  • debugging
  • code quality

Nice to have

  • data pipelines for large-scale structured and semi-structured technical datasets
  • data lineage, provenance, governance, and responsible data usage in ML systems
  • distributed data processing and orchestration systems (Ray, Airflow, Spark, or similar)
  • model deployment
  • inference services
  • monitoring
  • observability for production ML systems
  • ML-ready representations for geometry, graph, hierarchical, or multimodal data
  • CAD, BIM, AEC, or other complex domain-specific data formats

Other signals

  • ML Systems and Infrastructure
  • scalable pipelines
  • training infrastructure
  • data workflows
  • production-ready ML systems
  • distributed training workflows
  • data processing pipelines
  • model evaluation infrastructure
  • deployment systems
  • platform tooling