Applied Scientist

Adobe Adobe · Enterprise · San Jose, CA

Research scientist focused on conditional generation and editing of large multimodal generative AI models (images, videos, audio) for Adobe Firefly. Emphasizes large-scale pre-training and mid-training, with a direct impact on creative workflows for millions of users. Responsibilities include designing and implementing mid-training approaches, owning components like image-to-image editing, building captioning pipelines, supporting VLM finetuning, and developing scalable workflows for data curation and distributed training.

What you'd actually do

  1. Contribute to the design, implementation, and evaluation of mid-training approaches that improve editing capabilities for Adobe's multimodal generative models across image, video, and audio.
  2. Own well-defined components within the mid-training stack — such as image-to-image editing or instruction-based editing — designing and running experiments to test hypotheses and identify quality gaps.
  3. Build and maintain large-scale captioning pipelines and support VLM finetuning efforts to improve multimodal understanding across visual and auditory domains.
  4. Assist in building scalable workflows for data curation, quality improvements, and distributed training, applying research insights from diffusion models and large-scale training to practical model improvement.
  5. Collaborate closely with research, data, evaluation, infrastructure, pre-training, and post-training teams, contributing to knowledge sharing and documentation of experiments, datasets, and training approaches.

Skills

Required

  • Master’s or Ph.D. degree in Computer Science, Machine Learning, or a related field.
  • Solid understanding of modern generative architectures such as diffusion models.
  • Familiarity with conditional generation or editing methods for image, video, or audio tasks.
  • Experience implementing machine learning models using modern deep learning frameworks (e.g., PyTorch).
  • Experience with large-scale or distributed training workflows.
  • Experience or research background in VLM finetuning with a focus on image, video, and audio understanding.
  • Familiarity with captioning for large-scale data.
  • Strong coding and prototyping ability in Python and PyTorch.
  • Excellent communication skills and ability to collaborate across cross-functional teams.

What the JD emphasized

  • large-scale, industry-level pre-training
  • mid-training
  • multimodality generative models
  • image, video, and audio

Other signals

  • conditional generation
  • editing capabilities
  • multimodal generative models
  • large-scale pre-training
  • mid-training