Senior Applied Scientist - Machine Learning Systems Engineer- Photoshop

Senior ML Systems & Efficiency Engineer for Photoshop ART R&D team, focused on optimizing inference performance, latency, and cost efficiency for image editing applications. The role involves deep expertise in AI/ML systems, computer vision, distributed inference, and performance optimization, with a mandate to deliver production-ready ML systems at lower cost and higher efficiency. Responsibilities include designing and optimizing inference systems, developing high-performance GPU kernels, conducting performance profiling, collaborating on distributed serving systems, and establishing cost-aware ML engineering practices.

What you'd actually do

Design and optimize high-throughput, low-latency inference systems. Optimize model architectures to improve deployment and runtime efficiency using techniques such as distillation, pruning, quantization, and Mixture-of-Experts (MoE). Implement advanced serving strategies including batching, caching (KV, semantic, embedding), quantization (FP8/INT8), and distributed inference strategies including data, tensor, pipeline, expert, and hybrid parallelism, with a focus on balancing computation and communication efficiency. Explore training or fine-tuning approaches when they directly lead to more efficient inference, simpler deployment, or improved runtime performance.
Write and maintain high-performance GPU kernels using Triton or CUDA to accelerate custom model layers and critical workloads. Improve GPU utilization through kernel fusion, asynchronous pipelines, and optimized scheduling strategies.
Conduct deep performance analysis using tools such as PyTorch Profiler and NVIDIA Nsight to identify bottlenecks in compute, memory, and communication. Optimize end-to-end system performance across inference workloads.
Partner with infrastructure teams to design scalable and reliable distributed serving systems across heterogeneous hardware environments (e.g., A100, H100, B200, CPU). Contribute to resource scheduling, GPU pooling, and elastic workload management.
Establish and track efficiency metrics such as cost per million inferences. Build benchmarking frameworks and dashboards to guide tradeoffs among quality, latency, and compute cost, enabling data-driven system and product decisions.

Skills

Required

Python
C++
GPU architecture understanding
performance diagnosis
distributed systems
high-performance systems development
Triton or CUDA for performance-critical workloads
rigorous measurement and benchmarking
system efficiency, scalability, and reliability in production environments

Nice to have

Master’s or PhD in Computer Science, Electrical Engineering, or related field with focus on ML systems, distributed systems, or HPC
Triton
vLLM
SGLang
xDiT
TensorRT
ONNX Runtime
AOTI
operator fusion
graph-level optimization
PyTorch Profiler
NVIDIA Nsight
CUDA tooling
NCCL
Docker
Kubernetes
Transformers
multimodal models
Mixture-of-Experts (MoE)
Diffusion Transformers (DiT)

What the JD emphasized

production-ready
inference performance
latency
cost efficiency
high-quality ML systems
substantially lower cost
higher efficiency
deep expertise
distributed inference
multimodal model profiling
performance optimization
high-leverage role
outsized impact
saving millions of dollars
practical innovations
high-throughput
low-latency inference systems
runtime efficiency
advanced serving strategies
distributed inference strategies
computation and communication efficiency
more efficient inference
simpler deployment
improved runtime performance
high-performance GPU kernels
critical workloads
GPU utilization
asynchronous pipelines
optimized scheduling strategies
deep performance analysis
identify bottlenecks
end-to-end system performance
inference workloads
scalable and reliable distributed serving systems
heterogeneous hardware environments
resource scheduling
GPU pooling
elastic workload management
cost-aware ML engineering
efficiency metrics
cost per million inferences
benchmarking frameworks
data-driven system and product decisions
trusted technical advisor
efficiency tradeoffs
best practices
scalable and cost-efficient ML development
performance-oriented systems design
Distributed Inference & Serving Expertise
large-scale inference
serving workloads
distributed frameworks
runtime systems
inference compilation and optimization tools
system-level performance tradeoffs
GPU & Performance Engineering Skills
GPU architecture
diagnosing performance bottlenecks
compute, memory, and I/O subsystems
Programming & Systems Development
high-performance or distributed systems
performance-critical workloads
Data-Driven Engineering Mindset
rigorous measurement and benchmarking
system efficiency, scalability, and reliability
production environments
Open-source serving frameworks
Inference compilation tools
GPU profiling and performance analysis tools
Distributed Systems & Communication
low-level communication libraries
large-scale distributed serving environments
Containerization & Cluster Operations
containerized workflows
production ML workloads
shared GPU clusters
Model Architectures

Other signals

inference performance
latency
cost efficiency
GPU utilization
distributed inference
multimodal model profiling

Read full job description

The Opportunity

Photoshop ART is seeking a Senior Machine Learning (ML) Systems & Efficiency Engineer to join our R&D team focused on delivering practical, production-ready improvements in inference performance, latency, and cost efficiency across image editing applications. This role sits at the intersection of model architecture, systems, inference runtimes, and services, with a clear mandate: deliver high-quality ML systems at substantially lower cost and higher efficiency. Individuals in this role are expected to have deep expertise in areas such as Artificial Intelligence (AI), ML systems, and computer vision. Strong preference will be given to candidates with experience in distributed inference, multimodal model profiling, and performance optimization. You will work closely with research, product, and infrastructure teams to influence model design decisions, improve GPU utilization, and build scalable, cost-aware ML systems deployed in production.

This is a hands-on, high-leverage role where a single engineer can drive outsized impact, potentially saving millions of dollars in compute costs. The ideal candidate will have a strong interest in developing practical innovations that advance Adobe products.

Job Responsibilities

Inference & Serving Optimization: Design and optimize high-throughput, low-latency inference systems. Optimize model architectures to improve deployment and runtime efficiency using techniques such as distillation, pruning, quantization, and Mixture-of-Experts (MoE). Implement advanced serving strategies including batching, caching (KV, semantic, embedding), quantization (FP8/INT8), and distributed inference strategies including data, tensor, pipeline, expert, and hybrid parallelism, with a focus on balancing computation and communication efficiency. Explore training or fine-tuning approaches when they directly lead to more efficient inference, simpler deployment, or improved runtime performance.
Kernel Development & System Acceleration: Write and maintain high-performance GPU kernels using Triton or CUDA to accelerate custom model layers and critical workloads. Improve GPU utilization through kernel fusion, asynchronous pipelines, and optimized scheduling strategies.
Performance Profiling & System Optimization: Conduct deep performance analysis using tools such as PyTorch Profiler and NVIDIA Nsight to identify bottlenecks in compute, memory, and communication. Optimize end-to-end system performance across inference workloads.
Distributed Systems & Infrastructure Collaboration: Partner with infrastructure teams to design scalable and reliable distributed serving systems across heterogeneous hardware environments (e.g., A100, H100, B200, CPU). Contribute to resource scheduling, GPU pooling, and elastic workload management.
Cost-Aware ML Engineering: Establish and track efficiency metrics such as cost per million inferences. Build benchmarking frameworks and dashboards to guide tradeoffs among quality, latency, and compute cost, enabling data-driven system and product decisions.
Technical Leadership & Best Practices: Serve as a trusted technical advisor to research and product teams on efficiency tradeoffs. Define best practices for scalable and cost-efficient ML development and mentor engineers on performance-oriented systems design.

What You’ll Need to Succeed

Education: Master’s or PhD in Computer Science, Electrical Engineering, or a related field, with a focus on machine learning systems, distributed systems, or high-performance computing.
**Distributed Inference & Serving Expertise: **Hands-on experience implementing and scaling large-scale inference or serving workloads using distributed frameworks and runtime systems (e.g., Triton, vLLM, SGLang, xDiT, or similar). Experience applying inference compilation and optimization tools (e.g., TensorRT, ONNX Runtime, AOTI), including techniques such as operator fusion and graph-level optimization, with a strong understanding of system-level performance tradeoffs.
GPU & Performance Engineering Skills: Strong understanding of GPU architecture (e.g., memory hierarchy, compute throughput, communication bandwidth) and practical experience diagnosing performance bottlenecks across compute, memory, and I/O subsystems.
Programming & Systems Development: Proficiency in Python and C++, with experience building high-performance or distributed systems. Familiarity with CUDA or Triton for performance-critical workloads is highly desirable.
Data-Driven Engineering Mindset: Demonstrated ability to make engineering decisions based on rigorous measurement and benchmarking, with a focus on improving system efficiency, scalability, and reliability in production environments.

Preferred Experience

ML Frameworks & Tooling: Experience contributing to or maintaining performance- or efficiency-focused libraries or systems. Hands-on experience with:
- Open-source serving frameworks (e.g., vLLM, SGLang, xDiT, or similar)
- Inference compilation tools (e.g., TensorRT, Triton, AOTI, or equivalent, operation fusion, or graph-level optimization)
- GPU profiling and performance analysis tools (e.g., PyTorch Profiler, NVIDIA Nsight, CUDA tooling)
Distributed Systems & Communication: Exposure to low-level communication libraries such as NCCL and a practical understanding of collective operations (e.g., AllReduce, AllGather) in large-scale distributed serving environments.
Containerization & Cluster Operations: Familiarity with containerized workflows (Docker, Kubernetes) and job scheduling in headless Linux environments, including experience operating production ML workloads on shared GPU clusters.
Model Architectures: Working knowledge of model architectures such as Transformers, multimodal models, Mixture-of-Experts (MoE), or Diffusion Transformers (DiT).

About Adobe

Adobe empowers everyone to create through innovative platforms and tools that unleash creativity, productivity and personalized customer experiences. Adobe’s industry-leading offerings including Adobe Acrobat Studio, Adobe Express, Adobe Firefly, Creative Cloud, Adobe Experience Platform, Adobe Experience Manager, and GenStudio enable people and businesses to turn ideas into impact, powered by AI and driven by human ingenuity.

Our 30,000+ employees worldwide are creating the future and raising the bar as we drive the next decade of growth. We’re on a mission to hire the very best and believe in creating a company culture where all employees are empowered to make an impact. At Adobe, we believe that great ideas can come from anywhere in the organization. The next big idea could be yours.

** Let’s Adobe together**

At Adobe, we believe in creating a company culture where all employees are empowered to make an impact. Learn more about Adobe life, including our values and culture, focus on people, purpose and community, Adobe for All, comprehensive benefits programs, the stories we tell, the customers we serve, and how you can help us advance our mission of empowering everyone to create.

Adobe is proud to be an Equal Employment Opportunity employer. We do not discriminate based on gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other protected characteristic. Learn more.

Adobe aims to make our Careers website and recruiting process accessible to any and all users. If you have a disability or special need that requires accommodation to navigate our website or complete the application process, email accommodations@adobe.com or call +1 408-536-3015.

AI Use Guidelines for Interviews: Our interviews are designed to reflect your own skills and thinking. The use of AI or recording tools during live interviews is not permitted unless explicitly invited by the interviewer or approved in advance as part of a reasonable accommodation. If these tools are used inappropriately or in a way that misrepresents your work, your application may not move forward in the process.

At Adobe, we empower employees to innovate with AI — and we look for candidates eager to do the same. As part of the hiring experience, we provide clear guidance on where AI is encouraged during the process and where it’s restricted during live interviews. See how we think about AI in the hiring experience.

Expected Pay Range:

Our compensation reflects the cost of labor across several U.S. geographic markets, and we pay differently based on those defined markets. The U.S. pay range for this position is $164,000 -- $313,300 annually. Pay within this range varies by work location and may also depend on job-related knowledge, skills, and experience. Your recruiter can share more about the specific salary range for the job location during the hiring process.

In California, the pay range for this position is $216,400 - $313,300 In Washington, the pay range for this position is $204,800 - $296,600

At Adobe, for sales roles starting salaries are expressed as total target compensation (TTC = base + commission), and short-term incentives are in the form of sales commission plans. Non-sales roles starting salaries are expressed as base salary and short-term incentives are in the form of the Annual Incentive Plan (AIP).

In addition, certain roles may be eligible for long-term incentives in the form of a new hire equity award.

State-Specific Notices:

California:

Fair Chance Ordinances

Adobe will consider qualified applicants with arrest or conviction records for employment in accordance with state and local laws and “fair chance” ordinances.

Colorado:

Application Window Notice

If this role is open to hiring in Colorado (as listed on the job posting), the application window will remain open until at least the date and time stated above in Pacific Time, in compliance with Colorado pay transparency regulations. If this role does not have Colorado listed as a hiring location, no specific application window applies, and the posting may close at any time based on hiring needs.

Massachusetts:

Massachusetts Legal Notice

It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.