Lead Machine Learning Engineer

Disney Disney · Media · Orlando, FL +2

Lead Machine Learning Engineer to design, develop, and deploy AI/ML solutions, including agentic systems, multi-modal models, RAG, and Responsible AI applications. The role involves building an AI enablement platform, driving the full AI/ML lifecycle, and providing technical leadership for production-scale AI systems, evaluation frameworks, and responsible AI practices.

What you'd actually do

  1. Develop sophisticated, production-scale AI systems, including multi-step agentic workflows and multi-agent orchestration platforms.
  2. Build tools & agents with advanced capabilities in reasoning, planning, and adaptive tool utilization to address complex business challenges.
  3. Drive complete ownership of the AI/ML lifecycle – encompassing implementation, comprehensive testing, deployment, and continuous operational monitoring – delivering projects on schedule and to specification.
  4. Produce high-quality, maintainable code for model training pipelines, evaluation frameworks, and inference services that meet production standards.
  5. Design and implement responsible AI frameworks including hallucination detection, safety guardrails, comprehensive evaluation systems, and observability infrastructure to ensure model reliability, accuracy, and ethical deployment.

Skills

Required

  • Designing, building, and deploying AI/ML solutions at scale
  • Production experience in Generative AI technologies
  • Machine learning including statistical modeling, supervised and unsupervised learning algorithms
  • Prompt engineering with deep understanding of optimization techniques and best practices for LLM interactions
  • Expert-level programming proficiency in Python and AI/ML development ecosystems
  • Modern AI frameworks including LLM application development and agentic systems (LangChain, CrewAI, or similar)
  • MLOps experience with hands-on implementation of CI/CD pipelines, model monitoring, versioning, and lifecycle management for both models and agent-based systems
  • Production deployment experience on major cloud platforms (AWS, Azure, or GCP)
  • Architect and scale cloud-native ML solutions
  • ML skillset spanning traditional techniques (classification, regression, clustering) and cutting-edge deep learning approaches
  • Production-grade generative AI experience deploying and maintaining LLMs and multi-modal models in live environments
  • Analytical capabilities with a track record of solving complex technical problems and thriving in ambiguous, rapidly-evolving situations
  • Industry-standard ML libraries including PyTorch, TensorFlow, Scikit-learn, NumPy, and Pandas
  • Communication and collaboration skills with ability to translate complex technical concepts for diverse audiences and drive cross-functional alignment
  • Partnering across organizational levels from individual contributors to senior leadership, building trust and delivering results
  • Influence and lead in matrix organizations where collaboration and relationship-building are essential to achieving outcomes

Nice to have

  • vector databases and embedding technologies
  • Specialized expertise in AI safety and responsible AI using evaluation tools such as Arize, Langfuse, TruLens, or equivalent platforms for hallucination detection, bias mitigation, and model performance assessment
  • advanced ML techniques including reinforcement learning

What the JD emphasized

  • production-scale AI systems
  • agentic systems
  • multi-modal models
  • Responsible AI applications
  • multi-step agentic workflows
  • multi-agent orchestration platforms
  • advanced capabilities in reasoning, planning, and adaptive tool utilization
  • complete ownership of the AI/ML lifecycle
  • production standards
  • responsible AI frameworks
  • hallucination detection
  • safety guardrails
  • comprehensive evaluation systems
  • observability infrastructure
  • model reliability, accuracy, and ethical deployment
  • comprehensive evaluation frameworks for Large Language Models and agent-based systems
  • model quality, task success rates, safety compliance, and operational effectiveness
  • production-grade generative AI experience deploying and maintaining LLMs and multi-modal models in live environments
  • production deployment experience on major cloud platforms
  • architect and scale cloud-native ML solutions
  • MLOps experience with hands-on implementation of CI/CD pipelines, model monitoring, versioning, and lifecycle management for both models and agent-based systems

Other signals

  • design, develop, and deploy high-impact AI/ML solutions
  • build an AI enablement platform
  • design, develop, implement enterprise grade and robust AI/ML solutions, including agentic systems, multi-modal models, RAG, and Responsible AI applications
  • Develop sophisticated, production-scale AI systems, including multi-step agentic workflows and multi-agent orchestration platforms
  • Drive complete ownership of the AI/ML lifecycle
  • Produce high-quality, maintainable code for model training pipelines, evaluation frameworks, and inference services
  • Design and implement responsible AI frameworks including hallucination detection, safety guardrails, comprehensive evaluation systems, and observability infrastructure
  • Establish comprehensive evaluation frameworks for Large Language Models and agent-based systems
  • Drive innovation through research and experimentation with emerging AI technologies and frameworks