Applied Scientist Ii, Alexa for Shopping Science UK

Amazon Amazon · Big Tech · London, United Kingdom · Data Science

Applied Scientist II role focused on developing and optimizing LLM/SLM powered conversational experiences for Alexa Shopping. This involves designing and implementing LLM agents, instruction design, contextual grounding, using MCP tools, agent/multi-agent systems, context engineering, model fine-tuning, and evaluation frameworks. The role also involves applying ML/DL techniques for last-mile improvements in ranking, relevance, personalization, and multimodal understanding, and designing agentic architectures with considerations for quality, latency, and reliability at scale. It requires hands-on analysis of multimodal interaction datasets, using statistical methods for evaluation and optimization, and collaborating with product and engineering teams.

What you'd actually do

  1. develop and maintain LLM agents, including automated eval pipelines, LLM-as-a-judge methodologies, rubric design, and dataset curation to measure nuanced aspects of response quality.
  2. experiment with techniques such as retrieval augmentation, context enrichment, prompt decomposition, and model fine-tuning or post-training strategies, if and when applicable.
  3. lead post-training of small language models (SLMs) — including supervised fine-tuning, preference optimisation, and distillation — to deliver low-latency conversational and shopping experiences.
  4. apply applied machine learning and deep learning techniques as last-mile improvements to shopping experiences, that might span ranking, relevance, personalisation, and multimodal understanding.
  5. design and evaluate agentic architectures that balance the needs of diverse shopping use cases, making principled choices across paradigms such as single-agent and multi-agent systems, memory management strategies, and tool orchestration to optimise for quality, latency, and reliability at scale.

Skills

Required

  • PhD, or a Master's degree and experience in CS, CE, ML or related field
  • Experience in state-of-the-art deep learning models architecture design and deep learning training and optimization and model pruning
  • Experience in patents or publications at top-tier peer-reviewed conferences or journals
  • Experience programming in Java, C++, Python or related language
  • Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing
  • Experience in building machine learning models for business application

Nice to have

  • Experts in hands-on large language model post-training and in-depth understanding in the algorithms including both supervised fine-tuning and large scale reinforcement learning, especially for large scale distributed training
  • Strong publication records in top-tier NLP/LLM conferences as NeuRIPS, ICLR, ICML, EMNLP, ACL, NAACL with 500+ citations.

What the JD emphasized

  • strong machine learning background
  • LLM/SLM
  • agent/multi-agent systems
  • model fine-tuning
  • evaluation frameworks
  • conversational AI performance at Amazon scale
  • retrieval augmentation
  • context enrichment
  • prompt decomposition
  • model fine-tuning or post-training strategies
  • post-training of small language models (SLMs)
  • supervised fine-tuning
  • preference optimisation
  • distillation
  • low-latency conversational and shopping experiences
  • applied machine learning and deep learning techniques
  • ranking, relevance, personalisation, and multimodal understanding
  • agentic architectures
  • single-agent and multi-agent systems
  • memory management strategies
  • tool orchestration
  • quality, latency, and reliability at scale
  • large-scale multimodal interaction datasets
  • conversational AI systems
  • response quality and customer experience
  • statistical methods, experimentation, and data-driven analysis
  • measuring, evaluating, and optimizing large language model (LLM)-based shopping assistant systems
  • structured and unstructured contextual signals
  • conversational relevance, grounding, customer satisfaction, and downstream business impact
  • model evaluation and deployment
  • technical and non-technical audiences
  • conversational AI system
  • agentic
  • multimodal user queries
  • text, image, audio and video
  • Natural Language Processing
  • gen AI
  • Information Retrieval
  • Machine/Deep Learning
  • Data Mining
  • internal and external scientific communities
  • state-of-the-art deep learning models architecture design
  • deep learning training and optimization
  • model pruning
  • patents or publications at top-tier peer-reviewed conferences or journals
  • building machine learning models for business application
  • large language model post-training
  • supervised fine-tuning
  • large scale reinforcement learning
  • large scale distributed training
  • top-tier NLP/LLM conferences

Other signals

  • LLM/SLM conversational experiences
  • agent/multi-agent systems
  • retrieval augmentation
  • model fine-tuning
  • post-training strategies
  • ranking, relevance, personalisation
  • multimodal understanding
  • agentic architectures
  • tool orchestration