Senior Machine Learning Operations Developer: Ai/ml Platform

Autodesk Autodesk · Enterprise · Toronto, ON +2

Autodesk is seeking a Senior Machine Learning Operations Developer to join their AI/ML Platform team. This role focuses on operationalizing machine learning models and ensuring the efficiency of their AI/ML platform, which supports the development of machine learning and generative AI solutions. Responsibilities include driving operational excellence, automating deployment pipelines, building scalable infrastructure for training and inference, implementing monitoring and logging systems, and contributing to model governance and compliance.

What you'd actually do

  1. Drive the operational excellence of our AI/ML Platform by implementing and optimizing MLOps practices
  2. Design and implement automated deployment pipelines for machine learning models, ensuring seamless transitions from development to production
  3. Collaborate with cross-functional teams to design, implement, and maintain scalable infrastructure for model training, inference, and data processing
  4. Develop and maintain robust monitoring and logging systems to track model performance, system health, and overall platform efficiency
  5. Work closely with data engineers to ensure efficient data pipelines for model training and validation

Skills

Required

  • DevOps
  • MLOps
  • deploying and managing machine learning models in production
  • Infrastructure as Code (IaC)
  • Terraform
  • Ansible
  • containerization
  • Docker
  • Kubernetes
  • CI/CD
  • Python
  • Bash
  • monitoring tools
  • Prometheus
  • Grafana
  • ELK Stack
  • security best practices in MLOps
  • collaboration skills
  • problem-solving skills

Nice to have

  • AWS
  • Azure
  • SQL
  • NoSQL
  • data lakes
  • TensorFlow
  • PyTorch
  • Git
  • Jira
  • Agile development methodologies

What the JD emphasized

  • 5+ years of hands-on experience in DevOps and MLOps
  • deploying and managing machine learning models in production environments
  • scalable infrastructure for model training, inference, and data processing
  • monitoring and logging systems to track model performance
  • model governance practices

Other signals

  • MLOps
  • AI/ML Platform
  • deployment automation
  • scalable infrastructure
  • monitoring and logging
  • model governance