Senior Machine Engineer, ML Systems and Infrastructure

Autodesk Autodesk · Enterprise · Boston, MA +25 · Remote

Autodesk is seeking a Senior ML Engineer to design and scale ML systems and infrastructure, focusing on data pipelines, distributed training, evaluation frameworks, and production ML workflows for foundation models and ML-powered features. The role emphasizes scalable systems and production-grade ML infrastructure, operating independently across the stack.

What you'd actually do

  1. Design and build scalable systems for ML training, evaluation, deployment, and monitoring
  2. Develop and improve data pipelines that process large-scale structured and semi-structured technical datasets
  3. Optimize distributed workflows for performance, reliability, resource utilization, and cost efficiency
  4. Build platform capabilities such as experiment tracking, model versioning, checkpointing, reproducibility, and observability
  5. Contribute to model deployment, inference services, and production monitoring workflows

Skills

Required

  • Python
  • cloud platforms (AWS, Azure, or GCP)
  • containers
  • CI/CD
  • observability
  • release quality practices

Nice to have

  • data lineage
  • provenance
  • governance
  • responsible data usage in ML systems
  • Ray
  • Airflow
  • Spark
  • model deployment
  • inference services
  • monitoring
  • observability for production ML systems
  • geometry
  • graph
  • hierarchical
  • multimodal data
  • PyTorch
  • Lightning
  • DeepSpeed
  • FSDP
  • Megatron
  • AEC workflows
  • design data
  • BIM/CAD formats
  • Autodesk products

What the JD emphasized

  • At least 3 to 4 years of industry experience building and operating production software, ML systems, distributed infrastructure, or large-scale data pipelines
  • Strong experience in software engineering, distributed systems, backend systems, or ML infrastructure
  • Experience designing and operating scalable data or compute pipelines
  • Ability to independently drive technical execution on complex work with limited oversight

Other signals

  • ML Systems and Infrastructure
  • scalable systems
  • production-grade ML infrastructure
  • large-scale data pipelines
  • distributed training systems
  • evaluation frameworks
  • production ML workflows
  • foundation models
  • ML-powered product features