What you'd actually do

Define and drive the technical strategy for mid-training approaches that improve editing capabilities across Adobe's multimodal generative models for image, video, and audio.

Own and drive multiple complex workstreams within the mid-training stack (e.g., image-to-editing, instruction-based editing, cross-modal editing), making key architectural and prioritization decisions.

Set technical direction for large-scale captioning pipelines and lead VLM finetuning strategy to improve multimodal understanding across visual and auditory domains.

Own end-to-end workflows for data curation, quality improvements, and distributed training, driving infrastructure decisions that unblock the broader organization.

Drive alignment across research, data, evaluation, infrastructure, pre-training, and post-training teams, influencing leadership on technical strategy and investment priorities.

Skills

Required

Ph.D. in Computer Science, Machine Learning, or a related field
significant industry experience building and shipping large-scale ML systems
Deep expertise in modern generative architectures such as diffusion models
experience owning end-to-end conditional generation or editing pipelines for image, video, or audio
Proven ability to architect and scale ML systems using frameworks like PyTorch
leading distributed training infrastructure design
Extensive experience in VLM finetuning for image, video, and audio understanding
track record of aligning research goals with product requirements
Experience owning large-scale automated captioning pipelines across image, video, and audio datasets
Strong software engineering skills in Python and PyTorch
Excellent communication skills
ability to influence technical direction across teams
present strategy to senior leadership

Nice to have

mid-training approaches
editing capabilities
image-to-image editing
instruction-based editing
cross-modal editing
VLM finetuning strategy
multimodal understanding
data curation
quality improvements
distributed training
infrastructure decisions
pre-training
post-training teams

What the JD emphasized

large-scale, industry-level pre-training

mid-training

multi-modality generative models

image and video generation models

editing capabilities

multimodal generative models

image, video, and audio

mid-training stack

image-to-image editing

instruction-based editing

cross-modal editing

large-scale captioning pipelines

VLM finetuning strategy

multimodal understanding

visual and auditory domains

data curation

quality improvements

distributed training

infrastructure decisions

pre-training

post-training teams

Ph.D. in Computer Science, Machine Learning, or a related field

significant industry experience building and shipping large-scale ML systems

Deep expertise in modern generative architectures such as diffusion models

experience owning end-to-end conditional generation or editing pipelines for image, video, or audio

Proven ability to architect and scale ML systems using frameworks like PyTorch

leading distributed training infrastructure design

Extensive experience in VLM finetuning for image, video, and audio understanding

track record of aligning research goals with product requirements

Experience owning large-scale automated captioning pipelines across image, video, and audio datasets

Strong software engineering skills in Python and PyTorch

emphasis on production-quality systems

Changing the world through digital experiences is what Adobe’s all about. We give everyone—from emerging artists to global brands—everything they need to build and deliver outstanding digital experiences. We’re passionate about empowering people to develop beautiful and powerful images, videos, and apps, transforming how companies interact with customers across every screen.

We’re on a mission to hire the very best and are committed to crafting outstanding employee experiences. Everyone is respected and has access to equal opportunity. We realize new ideas can come from anywhere in the organization, and we know the next big idea could be yours!

Adobe Firefly’s ASML group invites research scientists and engineers passionate about conditional generation and editing of large generative AI models. This role emphasizes images and videos. We strive to advance generative AI technology while guaranteeing models possess excellent quality and control.

We are especially looking for candidates experienced in large-scale, industry-level pre-training and mid-training of multi-modality generative models. This role has a direct effect on the quality of Adobe’s image and video generation models, supporting next-generation creative workflows for millions of users.

As an Applied Scientist at Adobe, you will join a world-class team of applied researchers and engineers building the future of digital experiences. You will have the opportunity to innovate across the full training stack, collaborate across data, modeling, and product, and see your work ship to customers worldwide.

Job Responsibilities

Define and drive the technical strategy for mid-training approaches that improve editing capabilities across Adobe's multimodal generative models for image, video, and audio.
Own and drive multiple complex workstreams within the mid-training stack (e.g., image-to-image editing, instruction-based editing, cross-modal editing), making key architectural and prioritization decisions.
Set technical direction for large-scale captioning pipelines and lead VLM finetuning strategy to improve multimodal understanding across visual and auditory domains.
Own end-to-end workflows for data curation, quality improvements, and distributed training, driving infrastructure decisions that unblock the broader organization.
Drive alignment across research, data, evaluation, infrastructure, pre-training, and post-training teams, influencing leadership on technical strategy and investment priorities.
Mentor junior and mid-level engineers through design reviews and technical guidance, raising the team's overall capability.

What you'll need to succeed

Ph.D. in Computer Science, Machine Learning, or a related field, with significant industry experience building and shipping large-scale ML systems.
Deep expertise in modern generative architectures such as diffusion models, with experience owning end-to-end conditional generation or editing pipelines for image, video, or audio.
Proven ability to architect and scale ML systems using frameworks like PyTorch, including leading distributed training infrastructure design.
Extensive experience in VLM finetuning for image, video, and audio understanding, with a track record of aligning research goals with product requirements.
Experience owning large-scale automated captioning pipelines across image, video, and audio datasets.
Strong software engineering skills in Python and PyTorch, with emphasis on production-quality systems.
Excellent communication skills with the ability to influence technical direction across teams and present strategy to senior leadership.

About Adobe

Adobe empowers everyone to create through innovative platforms and tools that unleash creativity, productivity and personalized customer experiences. Adobe’s industry-leading offerings including Adobe Acrobat Studio, Adobe Express, Adobe Firefly, Creative Cloud, Adobe Experience Platform, Adobe Experience Manager, and GenStudio enable people and businesses to turn ideas into impact, powered by AI and driven by human ingenuity.

Our 30,000+ employees worldwide are creating the future and raising the bar as we drive the next decade of growth. We’re on a mission to hire the very best and believe in creating a company culture where all employees are empowered to make an impact. At Adobe, we believe that great ideas can come from anywhere in the organization. The next big idea could be yours.

** Let’s Adobe together**

At Adobe, we believe in creating a company culture where all employees are empowered to make an impact. Learn more about Adobe life, including our values and culture, focus on people, purpose and community, Adobe for All, comprehensive benefits programs, the stories we tell, the customers we serve, and how you can help us advance our mission of empowering everyone to create.

Adobe is proud to be an Equal Employment Opportunity employer. We do not discriminate based on gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other protected characteristic. Learn more.

Adobe aims to make our Careers website and recruiting process accessible to any and all users. If you have a disability or special need that requires accommodation to navigate our website or complete the application process, email accommodations@adobe.com or call +1 408-536-3015.

AI Use Guidelines for Interviews: Our interviews are designed to reflect your own skills and thinking. The use of AI or recording tools during live interviews is not permitted unless explicitly invited by the interviewer or approved in advance as part of a reasonable accommodation. If these tools are used inappropriately or in a way that misrepresents your work, your application may not move forward in the process.

At Adobe, we empower employees to innovate with AI — and we look for candidates eager to do the same. As part of the hiring experience, we provide clear guidance on where AI is encouraged during the process and where it’s restricted during live interviews. See how we think about AI in the hiring experience.

Expected Pay Range:

Our compensation reflects the cost of labor across several U.S. geographic markets, and we pay differently based on those defined markets. The U.S. pay range for this position is $164,000 -- $313,300 annually. Pay within this range varies by work location and may also depend on job-related knowledge, skills, and experience. Your recruiter can share more about the specific salary range for the job location during the hiring process.

In California, the pay range for this position is $216,400 - $313,300

At Adobe, for sales roles starting salaries are expressed as total target compensation (TTC = base + commission), and short-term incentives are in the form of sales commission plans. Non-sales roles starting salaries are expressed as base salary and short-term incentives are in the form of the Annual Incentive Plan (AIP).

In addition, certain roles may be eligible for long-term incentives in the form of a new hire equity award.

State-Specific Notices:

California:

Fair Chance Ordinances

Adobe will consider qualified applicants with arrest or conviction records for employment in accordance with state and local laws and “fair chance” ordinances.

Colorado:

Application Window Notice

If this role is open to hiring in Colorado (as listed on the job posting), the application window will remain open until at least the date and time stated above in Pacific Time, in compliance with Colorado pay transparency regulations. If this role does not have Colorado listed as a hiring location, no specific application window applies, and the posting may close at any time based on hiring needs.

Massachusetts:

Massachusetts Legal Notice

It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.