Machine Learning Engineer

Adobe · Enterprise · San Jose, CA +1

Machine Learning Data Engineer at Adobe focused on building foundational infrastructure for large-scale multimodal AI training and inference. The role involves developing distributed data loaders, feature enrichment pipelines, and dataset management systems to support foundation model training at petabyte scale, processing billions of multimodal assets.

What you'd actually do

  1. Contribute to building and maintaining distributed training data loaders that handle multi-source data ingestion, temporal sampling, and real-time transformations for large-scale model training workflows.
  2. Help implement and maintain feature enrichment pipelines and dataset registry systems that support multimodal model training across images, video, documents, and text.
  3. Build and maintain batch inference pipelines for large-scale feature extraction, processing assets through distributed GPU clusters with queue management and fault tolerance.
  4. Develop data processing systems using frameworks like Apache Ray, Spark, DuckDB, or similar distributed computing tools for SQL-based data ingestion and Apache Arrow-based storage formats.
  5. Support semantic search capabilities and vector database infrastructure (e.g., OpenSearch, LanceDB) for dataset discovery and embedding-based retrieval.

Skills

Required

  • Python
  • distributed systems
  • data engineering
  • Apache Spark
  • Apache Ray
  • PyTorch
  • TensorFlow
  • Docker
  • CI/CD

Nice to have

  • ML experience
  • MLOps practices
  • batch inference architectures
  • large-scale data processing patterns
  • MS degree

What the JD emphasized

  • building and operating distributed systems or data infrastructure in production environments
  • distributed computing concepts
  • large-scale multimodal AI training
  • petabyte scale
  • billions of images, videos, and multimodal content

Other signals

  • large-scale multimodal AI training
  • petabyte scale
  • generative AI model development
  • billions of images, videos, and multimodal content