Machine Learning Engineer

Adobe Adobe · Enterprise · San Jose, CA

Machine Learning Engineer to build AI systems for Adobe Journey Optimizer's autonomous operating system, focusing on improving product quality, accelerating issue resolution, and enhancing customer experience through intelligent automation and continuous learning. The role involves building LLM-based workflows, knowledge systems using vector embeddings, and integrating AI capabilities with production infrastructure.

What you'd actually do

  1. Build AI-powered systems that improve the quality, reliability, and customer experience of AJO — by automating issue detection and resolution with human-in-the-loop approval, learning from operational patterns to prevent recurring failures, and providing real-time visibility into customer health and platform stability
  2. Develop intelligent knowledge systems that compound expertise over time — using vector embeddings, similarity retrieval, and pattern clustering to ensure every incident investigation builds on past learnings, making the platform progressively smarter and more self-healing
  3. Design and implement LLM-based workflows using prompt engineering, structured outputs, tool calling, and agentic reasoning patterns to create autonomous capabilities that operate safely at production scale
  4. Build evaluation frameworks to measure AI system performance: quality improvement rates, automation success rates, mean time to resolution (MTTR) reduction, and customer impact metrics
  5. Integrate AI capabilities with production infrastructure: Kubernetes, Prometheus, Splunk, GitHub, and 30+ operational data sources — creating closed-loop systems that detect, learn, and act autonomously

Skills

Required

  • Python
  • ML frameworks (scikit-learn, PyTorch, TensorFlow, HuggingFace, or LangChain)
  • LLM APIs (OpenAI, Anthropic Claude, Azure OpenAI)
  • prompt engineering
  • vector databases and similarity search (FAISS, Pinecone, ChromaDB, MongoDB Atlas Vector Search, or similar)
  • ML concepts (embeddings, clustering, classification, evaluation metrics)
  • building APIs
  • integrating ML models into backend services (FastAPI, Flask, or similar)
  • model monitoring
  • A/B testing
  • continuous evaluation
  • safety guardrails for AI systems
  • problem-solving skills
  • attention to detail
  • communication and collaboration

Nice to have

  • Kubernetes
  • observability tools (Prometheus, Grafana, Datadog)
  • incident management systems
  • building AI agents for operational use cases

What the JD emphasized

  • AI-native platform
  • autonomous operating system
  • intelligent automation
  • continuous learning
  • LLM-based workflows
  • agentic reasoning patterns
  • autonomous capabilities
  • production scale
  • AI agents for operational use cases

Other signals

  • AI-native platform
  • intelligent automation
  • continuous learning
  • autonomous operating system
  • LLM-based workflows
  • agentic reasoning patterns
  • autonomous capabilities
  • AI agents for operational use cases