Senior Director, AI Model Lifecycle

Crusoe · Data AI · San Francisco, CA - US · Cloud Engineering

Senior Director role focused on establishing a team and platform for the entire ML model application development lifecycle, with an emphasis on LLMs. Responsibilities include managing fine-tuning systems, end-to-end training pipelines, distillation, and dataset/model management.

What you'd actually do

  1. Building a Team of Machine Learning Experts and being the Site leader for the Model Life Cycle Team.
  2. Manage fine-tuning systems for large foundation models (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling.
  3. Implement and maintain end-to-end training pipelines for Large Language Models.
  4. RFT and Reinforcement learning to the fine tuning and training sections
  5. Distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling).
  6. Dataset, model, and experiment management: versioning, lineage, evaluation, and reproducible fine-tuning at scale.

Skills

Required

  • Advanced degree in Computer Science, Engineering, or a related field.
  • 10+ years of industry experience leading and driving impactful projects in the AI Space
  • Lead and mentor a team of engineers with exceptional interpersonal skills, working autonomously while proactively collaborating with stakeholders at all levels.
  • Experience in Generative AI (Large Language Models, Multimodal).
  • Hands-on experience training, fine-tuning, and aligning LLMs using Reinforcement Learning and Reinforcement Fine-Tuning (RFT) techniques.

Nice to have

  • PhD in Machine Learning, Computer Science, NLP, or a related field strongly preferred
  • Research publications at NeurIPS, ICML, ICLR, ACL, EMNLP, or impactful preprints in the LLM post-training space
  • Proficiency in Golang or Python for large-scale, production-level services and PyTorch
  • Contributions to open-source AI projects such as vLLM or similar frameworks.
  • Performance optimizations on GPU systems and inference frameworks.

What the JD emphasized

  • 10+ years of industry experience leading and driving impactful projects in the AI Space
  • Experience in Generative AI (Large Language Models, Multimodal).
  • Hands-on experience training, fine-tuning, and aligning LLMs using Reinforcement Learning and Reinforcement Fine-Tuning (RFT) techniques.

Other signals

  • building a team
  • managed platform
  • ML models
  • LLMs
  • fine-tuning systems
  • training pipelines
  • dataset, model, and experiment management