Staff Applied Scientist

Adobe Adobe · Enterprise · San Jose, CA

Staff Applied Scientist at Adobe Firefly's ASML group, focusing on conditional generation and editing of large generative AI models, particularly for images and videos. The role emphasizes large-scale pre-training and mid-training of multi-modality generative models, with direct impact on Adobe's image and video generation models for millions of users. Responsibilities include defining technical strategy for mid-training, owning complex workstreams, setting technical direction for captioning pipelines and VLM finetuning, and owning end-to-end workflows for data curation and distributed training.

What you'd actually do

  1. Define and drive the technical strategy for mid-training approaches that improve editing capabilities across Adobe's multimodal generative models for image, video, and audio.
  2. Own and drive multiple complex workstreams within the mid-training stack (e.g., image-to-editing, instruction-based editing, cross-modal editing), making key architectural and prioritization decisions.
  3. Set technical direction for large-scale captioning pipelines and lead VLM finetuning strategy to improve multimodal understanding across visual and auditory domains.
  4. Own end-to-end workflows for data curation, quality improvements, and distributed training, driving infrastructure decisions that unblock the broader organization.
  5. Drive alignment across research, data, evaluation, infrastructure, pre-training, and post-training teams, influencing leadership on technical strategy and investment priorities.

Skills

Required

  • Ph.D. in Computer Science, Machine Learning, or a related field
  • significant industry experience building and shipping large-scale ML systems
  • Deep expertise in modern generative architectures such as diffusion models
  • experience owning end-to-end conditional generation or editing pipelines for image, video, or audio
  • Proven ability to architect and scale ML systems using frameworks like PyTorch
  • leading distributed training infrastructure design
  • Extensive experience in VLM finetuning for image, video, and audio understanding
  • track record of aligning research goals with product requirements
  • Experience owning large-scale automated captioning pipelines across image, video, and audio datasets
  • Strong software engineering skills in Python and PyTorch
  • Excellent communication skills
  • ability to influence technical direction across teams
  • present strategy to senior leadership

Nice to have

  • mid-training approaches
  • editing capabilities
  • image-to-image editing
  • instruction-based editing
  • cross-modal editing
  • VLM finetuning strategy
  • multimodal understanding
  • data curation
  • quality improvements
  • distributed training
  • infrastructure decisions
  • pre-training
  • post-training teams

What the JD emphasized

  • large-scale, industry-level pre-training
  • mid-training
  • multi-modality generative models
  • image and video generation models
  • editing capabilities
  • multimodal generative models
  • image, video, and audio
  • mid-training stack
  • image-to-image editing
  • instruction-based editing
  • cross-modal editing
  • large-scale captioning pipelines
  • VLM finetuning strategy
  • multimodal understanding
  • visual and auditory domains
  • data curation
  • quality improvements
  • distributed training
  • infrastructure decisions
  • pre-training
  • post-training teams
  • Ph.D. in Computer Science, Machine Learning, or a related field
  • significant industry experience building and shipping large-scale ML systems
  • Deep expertise in modern generative architectures such as diffusion models
  • experience owning end-to-end conditional generation or editing pipelines for image, video, or audio
  • Proven ability to architect and scale ML systems using frameworks like PyTorch
  • leading distributed training infrastructure design
  • Extensive experience in VLM finetuning for image, video, and audio understanding
  • track record of aligning research goals with product requirements
  • Experience owning large-scale automated captioning pipelines across image, video, and audio datasets
  • Strong software engineering skills in Python and PyTorch
  • emphasis on production-quality systems

Other signals

  • large-scale pre-training
  • mid-training
  • multimodality
  • image and video generation
  • editing capabilities
  • VLM finetuning
  • distributed training