Principal Scientist - Applied Research, Asml (multimodal Foundation Models)

Adobe Adobe · Enterprise · Noida, India +1

This role focuses on applied research for multimodal foundation models, specifically concerning the scaling, quality, and innovation of training data across image, video, and audio. The Principal Scientist will guide a team in architecture and training strategies, pioneer data ablations, lead research, optimize distributed training pipelines, and translate research into production models for Adobe's creative tools.

What you'd actually do

  1. Multimodal Model Training: Guide the Applied Research team on architecture and training strategies for models that bridge text, pixels, temporal frames, and audio waveforms.
  2. Cross-Modal Data Ablations: Pioneer our approach to data ablations across diverse modalities. You will build rigorous experiments to understand how dataset composition, cross-modal alignment (e.g., text-to-video, image-to-audio), pruning, and blending impact final model capabilities and commercial viability.
  3. Research Leadership: Guide a world-class team of applied scientists and ML researchers, encouraging rapid experimentation, deep scientific rigor, and groundbreaking progress in multimodal representation learning.
  4. Scale & Infrastructure: Partner closely with Core Engineering and Infrastructure teams to optimize distributed training pipelines across massive GPU clusters, ensuring efficient resource utilization and training stability for complex multimodal workloads.
  5. Bridge Research to Product: Translate ground breaking AI research into production-ready foundation models that power Adobe's flagship tools (Premiere Pro, After Effects, Photoshop, GenStudio).

Skills

Required

  • 10+ years of experience in Applied Machine Learning or AI Research
  • 4+ years mentoring and scaling high-performing research teams
  • Hands-on experience training models that process multiple modalities (e.g., Diffusion Transformers (DiT), Multimodal LLMs, joint embedding architectures like CLIP, or generative video/audio models) from scratch
  • Experience crafting and delivering data ablation studies
  • Architectural understanding of training massive models using frameworks like PyTorch, Megatron-LM, DeepSpeed, or FSDP
  • Turning ambiguous research questions into structured experiments with measurable outcomes
  • PhD or MS in Computer Science, Artificial Intelligence, Machine Learning, or a closely related highly quantitative field, or equivalent experience

Nice to have

  • commercial safety
  • creative quality

What the JD emphasized

  • multimodal training data
  • multimodal foundation models
  • multimodal data
  • distributed training at scale

Other signals

  • multimodal foundation models
  • training data
  • research leadership