What you'd actually do

Contribute to building and maintaining distributed training data loaders that handle multi-source data ingestion, temporal sampling, and real-time transformations for large-scale model training workflows.

Help implement and maintain feature enrichment pipelines and dataset registry systems that support multimodal model training across images, video, documents, and text.

Build and maintain batch inference pipelines for large-scale feature extraction, processing assets through distributed GPU clusters with queue management and fault tolerance.

Develop data processing systems using frameworks like Apache Ray, Spark, DuckDB, or similar distributed computing tools for SQL-based data ingestion and Apache Arrow-based storage formats.

Support semantic search capabilities and vector database infrastructure (e.g., OpenSearch, LanceDB) for dataset discovery and embedding-based retrieval.

**About the Role **

We are looking for a Machine Learning Data Engineer to join our Applied Science Data Frameworks team responsible for building the foundational infrastructure that powers large-scale multimodal AI training and inference. This role is ideal for someone with strong distributed systems and data engineering fundamentals who is eager to work in an ML-adjacent environment—contributing to training data loaders, distributed inference frameworks, feature enrichment pipelines, and dataset management systems that enable ML teams to train foundation models at petabyte scale. You'll work on high-impact projects involving distributed data loading for PyTorch training workloads, batch inference pipelines for feature enrichment, semantic search infrastructure for dataset discovery, and production-grade ML data pipelines that support generative AI model development. Your systems will process billions of images, videos, and multimodal content across large-scale GPU clusters. If you're excited about building distributed data frameworks, optimizing data pipelines at scale, and growing your expertise in ML infrastructure, we'd love to hear from you.

What You'll Do

Contribute to building and maintaining distributed training data loaders that handle multi-source data ingestion, temporal sampling, and real-time transformations for large-scale model training workflows.
Help implement and maintain feature enrichment pipelines and dataset registry systems that support multimodal model training across images, video, documents, and text.
Build and maintain batch inference pipelines for large-scale feature extraction, processing assets through distributed GPU clusters with queue management and fault tolerance.
Develop data processing systems using frameworks like Apache Ray, Spark, DuckDB, or similar distributed computing tools for SQL-based data ingestion and Apache Arrow-based storage formats.
Support semantic search capabilities and vector database infrastructure (e.g., OpenSearch, LanceDB) for dataset discovery and embedding-based retrieval.
Contribute to CI/CD infrastructure for ML systems including self-hosted runner management, Docker image builds, automated testing pipelines, and deployment automation.
Collaborate with ML research teams to translate training requirements into reliable, scalable data loading and preprocessing infrastructure.
Write reusable framework components, SDKs, and documentation to help accelerate platform adoption across modeling teams.
Optimize data pipeline performance across dimensions like startup latency, throughput, memory footprint, and GPU utilization.
Contribute to observability and reliability standards for production data systems supporting 24/7 training workloads.

What You Need to Succeed

3–4 years of professional experience building and operating distributed systems or data infrastructure in production environments.
Solid understanding of distributed computing concepts and experience with frameworks like Apache Spark, Ray, Dask, or equivalent.
Familiarity with cloud platforms (AWS or Azure) and data platforms such as Databricks or Spark.
Proficiency in Python and strong software engineering fundamentals — system design, data structures, algorithms.
Familiarity with ML frameworks such as PyTorch or TensorFlow; hands-on ML experience is a plus but not required.
Basic familiarity with MLOps practices including CI/CD pipelines, containerization (Docker), and deployment automation.
Familiarity with batch inference architectures and large-scale data processing patterns is a plus.
Bachelor's degree in Computer Science, Engineering, or a related field; MS is a plus.
Strong communication skills and ability to collaborate across engineering and research teams.

About Adobe

Adobe empowers everyone to create through innovative platforms and tools that unleash creativity, productivity and personalized customer experiences. Adobe’s industry-leading offerings including Adobe Acrobat Studio, Adobe Express, Adobe Firefly, Creative Cloud, Adobe Experience Platform, Adobe Experience Manager, and GenStudio enable people and businesses to turn ideas into impact, powered by AI and driven by human ingenuity.

Our 30,000+ employees worldwide are creating the future and raising the bar as we drive the next decade of growth. We’re on a mission to hire the very best and believe in creating a company culture where all employees are empowered to make an impact. At Adobe, we believe that great ideas can come from anywhere in the organization. The next big idea could be yours.

** Let’s Adobe together**

At Adobe, we believe in creating a company culture where all employees are empowered to make an impact. Learn more about Adobe life, including our values and culture, focus on people, purpose and community, Adobe for All, comprehensive benefits programs, the stories we tell, the customers we serve, and how you can help us advance our mission of empowering everyone to create.

Adobe is proud to be an Equal Employment Opportunity employer. We do not discriminate based on gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other protected characteristic. Learn more.

Adobe aims to make our Careers website and recruiting process accessible to any and all users. If you have a disability or special need that requires accommodation to navigate our website or complete the application process, email accommodations@adobe.com or call +1 408-536-3015.

AI Use Guidelines for Interviews: Our interviews are designed to reflect your own skills and thinking. The use of AI or recording tools during live interviews is not permitted unless explicitly invited by the interviewer or approved in advance as part of a reasonable accommodation. If these tools are used inappropriately or in a way that misrepresents your work, your application may not move forward in the process.

At Adobe, we empower employees to innovate with AI — and we look for candidates eager to do the same. As part of the hiring experience, we provide clear guidance on where AI is encouraged during the process and where it’s restricted during live interviews. See how we think about AI in the hiring experience.

Expected Pay Range:

Our compensation reflects the cost of labor across several U.S. geographic markets, and we pay differently based on those defined markets. The U.S. pay range for this position is $151,800 -- $265,350 annually. Pay within this range varies by work location and may also depend on job-related knowledge, skills, and experience. Your recruiter can share more about the specific salary range for the job location during the hiring process.

In California, the pay range for this position is $183,300 - $265,350 In Washington, the pay range for this position is $165,600 - $239,725

At Adobe, for sales roles starting salaries are expressed as total target compensation (TTC = base + commission), and short-term incentives are in the form of sales commission plans. Non-sales roles starting salaries are expressed as base salary and short-term incentives are in the form of the Annual Incentive Plan (AIP).

In addition, certain roles may be eligible for long-term incentives in the form of a new hire equity award.

State-Specific Notices:

California:

Fair Chance Ordinances

Adobe will consider qualified applicants with arrest or conviction records for employment in accordance with state and local laws and “fair chance” ordinances.

Colorado:

Application Window Notice

If this role is open to hiring in Colorado (as listed on the job posting), the application window will remain open until at least the date and time stated above in Pacific Time, in compliance with Colorado pay transparency regulations. If this role does not have Colorado listed as a hiring location, no specific application window applies, and the posting may close at any time based on hiring needs.

Massachusetts:

Massachusetts Legal Notice

It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.