Senior Research Scientist - Machine Learning Systems & Efficiency Engineer

Adobe Adobe · Enterprise · Seattle, WA +2

Senior ML Engineer focused on optimizing inference performance, latency, and cost efficiency for image editing applications. This role involves deep expertise in ML systems, computer vision, distributed inference, and performance optimization, working closely with research, product, and infrastructure teams to build scalable, cost-aware ML systems deployed in production.

What you'd actually do

  1. Design and optimize high-throughput, low-latency inference systems.
  2. Write and maintain high-performance GPU kernels using Triton or CUDA to accelerate custom model layers and critical workloads.
  3. Conduct deep performance analysis using tools such as PyTorch Profiler and NVIDIA Nsight to identify bottlenecks in compute, memory, and communication.
  4. Partner with infrastructure teams to design scalable and reliable distributed serving systems across heterogeneous hardware environments (e.g., A100, H100, B200, CPU).
  5. Establish and track efficiency metrics such as cost per million inferences.

Skills

Required

  • Python
  • C++
  • distributed inference
  • GPU architecture
  • performance profiling
  • inference serving workloads
  • large-scale inference
  • distributed frameworks
  • runtime systems
  • inference compilation
  • optimization tools
  • system-level performance tradeoffs
  • compute
  • memory
  • I/O subsystems
  • benchmarking
  • system efficiency
  • scalability
  • reliability

Nice to have

  • Triton
  • CUDA
  • TensorRT
  • ONNX Runtime
  • AOTI
  • operator fusion
  • graph-level optimization
  • PyTorch Profiler
  • NVIDIA Nsight
  • CUDA tooling
  • NCCL
  • Docker
  • Kubernetes
  • Transformers
  • multimodal models
  • Mixture-of-Experts (MoE)
  • Diffusion Transformers (DiT)

What the JD emphasized

  • production-ready improvements
  • production ML systems
  • production environments

Other signals

  • optimize inference performance
  • reduce inference cost
  • improve inference latency
  • production ML systems
  • GPU utilization