Sr Machine Learning Engineer

Disney Disney · Media · Lake Buena Vista, FL +5

This role focuses on building, deploying, and operating machine learning models for self-healing infrastructure management systems. It involves designing, training, and deploying models for anomaly detection, forecasting, and predictive analytics, as well as building near-real-time inference pipelines and closed-loop, event-driven systems that trigger automated remediation actions. The role owns the full ML lifecycle and integrates AI/ML-driven insights into operational tools and workflows.

What you'd actually do

  1. Architect, design, and implement reusable machine learning frameworks, patterns, and services that integrate into the enterprise automation and observability platforms
  2. Design, train, and deploy machine learning models for anomaly detection, forecasting, predictive analytics, event correlation, pattern recognition, classification, causal analysis, and more in distributed environments that can be used to surface leading indicators of failure
  3. Build near-real-time inference pipelines that generate actionable insights from live telemetry, including continuous streams of metrics, logs, traces, and operational events
  4. Create data abstractions and perform feature engineering on high-volume, high-cardinality telemetry data
  5. Evaluate model performance using real production signals and continuously iterate to improve accuracy and reliability

Skills

Required

  • software engineering experience
  • automation
  • machine learning
  • AI technologies
  • building production-grade ML models
  • inference pipelines
  • PyTorch
  • TensorFlow
  • Scikit-learn
  • Python
  • JavaScript
  • TypeScript
  • Go
  • Rust
  • event-driven or streaming data
  • statistics
  • data analysis
  • applied machine learning techniques
  • large-scale, real-world datasets

Nice to have

  • observability platforms
  • DevOps
  • digital twins

What the JD emphasized

  • production-grade ML models
  • full machine learning lifecycle
  • AI/ML-driven automation
  • AI/ML-driven insights

Other signals

  • building production-grade ML models
  • deploying machine learning models
  • full machine learning lifecycle
  • self-healing infrastructure management systems
  • predictive modeling and AI to large-scale telemetry