Sr Machine Learning Engineer- ML Infrastructure & Data Platforms

Adobe Adobe · Enterprise · San Jose, CA +1

Senior Machine Learning Engineer focused on building infrastructure for large-scale, multimodal AI training and inference. The role involves developing distributed data loaders, data pipelines, batch inference systems, and improving system performance, scalability, and reliability. It also includes implementing search and retrieval systems, CI/CD workflows, and partnering with researchers to turn model requirements into scalable systems.

What you'd actually do

  1. Build distributed data loaders to support large-scale training workflows
  2. Develop data pipelines for ingesting, transforming, and preparing multimodal datasets
  3. Design batch inference systems for high-volume data processing across GPU environments
  4. Improve system performance, scalability, and reliability using distributed computing tools (e.g., Ray, Spark, DuckDB)
  5. Implement search and retrieval systems using vector databases and embedding-based approaches

Skills

Required

  • 8+ years of experience building and operating distributed systems or ML infrastructure in production
  • Experience working with large-scale data pipelines or inference systems
  • Strong programming skills in Python
  • Foundation in software engineering principles
  • Experience with ML frameworks such as PyTorch or TensorFlow
  • Familiarity with distributed computing tools (e.g., Ray, Spark, Dask, or similar)
  • Experience working with cloud platforms such as AWS or Azure
  • Understanding of MLOps practices, including CI/CD and deployment workflows
  • Ability to communicate clearly and collaborate with cross-functional teams

Nice to have

  • Experience working with multimodal data (images, video, text)
  • Familiarity with vector databases or semantic search systems

What the JD emphasized

  • large-scale multimodal AI training and inference
  • large-scale data pipelines or inference systems
  • multimodal data (images, video, text)

Other signals

  • large-scale multimodal AI training and inference
  • distributed systems
  • data engineering
  • train and deploy models at scale
  • billions of data points
  • large GPU environments