Applied Scientist Ii, Alexa for Shopping Science UK

Amazon · Big Tech · London, United Kingdom · Data Science

Applied Scientist II role focused on developing and optimizing LLM/SLM powered conversational experiences for Alexa Shopping. This involves designing and implementing LLM agents, instruction design, contextual grounding, using MCP tools, agent/multi-agent systems, context engineering, model fine-tuning, and evaluation frameworks. The role also involves applying ML/DL techniques for last-mile improvements in ranking, relevance, personalization, and multimodal understanding, and designing agentic architectures with considerations for quality, latency, and reliability at scale. It requires hands-on analysis of multimodal interaction datasets, using statistical methods for evaluation and optimization, and collaborating with product and engineering teams.

What you'd actually do

develop and maintain LLM agents, including automated eval pipelines, LLM-as-a-judge methodologies, rubric design, and dataset curation to measure nuanced aspects of response quality.
experiment with techniques such as retrieval augmentation, context enrichment, prompt decomposition, and model fine-tuning or post-training strategies, if and when applicable.
lead post-training of small language models (SLMs) — including supervised fine-tuning, preference optimisation, and distillation — to deliver low-latency conversational and shopping experiences.
apply applied machine learning and deep learning techniques as last-mile improvements to shopping experiences, that might span ranking, relevance, personalisation, and multimodal understanding.
design and evaluate agentic architectures that balance the needs of diverse shopping use cases, making principled choices across paradigms such as single-agent and multi-agent systems, memory management strategies, and tool orchestration to optimise for quality, latency, and reliability at scale.

Skills

Required

PhD, or a Master's degree and experience in CS, CE, ML or related field
Experience in state-of-the-art deep learning models architecture design and deep learning training and optimization and model pruning
Experience in patents or publications at top-tier peer-reviewed conferences or journals
Experience programming in Java, C++, Python or related language
Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing
Experience in building machine learning models for business application

Nice to have

Experts in hands-on large language model post-training and in-depth understanding in the algorithms including both supervised fine-tuning and large scale reinforcement learning, especially for large scale distributed training
Strong publication records in top-tier NLP/LLM conferences as NeuRIPS, ICLR, ICML, EMNLP, ACL, NAACL with 500+ citations.

What the JD emphasized

strong machine learning background
LLM/SLM
agent/multi-agent systems
model fine-tuning
evaluation frameworks
conversational AI performance at Amazon scale
retrieval augmentation
context enrichment
prompt decomposition
model fine-tuning or post-training strategies
post-training of small language models (SLMs)
supervised fine-tuning
preference optimisation
distillation
low-latency conversational and shopping experiences
applied machine learning and deep learning techniques
ranking, relevance, personalisation, and multimodal understanding
agentic architectures
single-agent and multi-agent systems
memory management strategies
tool orchestration
quality, latency, and reliability at scale
large-scale multimodal interaction datasets
conversational AI systems
response quality and customer experience
statistical methods, experimentation, and data-driven analysis
measuring, evaluating, and optimizing large language model (LLM)-based shopping assistant systems
structured and unstructured contextual signals
conversational relevance, grounding, customer satisfaction, and downstream business impact
model evaluation and deployment
technical and non-technical audiences
conversational AI system
agentic
multimodal user queries
text, image, audio and video
Natural Language Processing
gen AI
Information Retrieval
Machine/Deep Learning
Data Mining
internal and external scientific communities
state-of-the-art deep learning models architecture design
deep learning training and optimization
model pruning
patents or publications at top-tier peer-reviewed conferences or journals
building machine learning models for business application
large language model post-training
supervised fine-tuning
large scale reinforcement learning
large scale distributed training
top-tier NLP/LLM conferences

Other signals

LLM/SLM conversational experiences
agent/multi-agent systems
retrieval augmentation
model fine-tuning
post-training strategies
ranking, relevance, personalisation
multimodal understanding
agentic architectures
tool orchestration

Apply on company site

● Active

Posted 5d ago · 5 days open

AI score: 9/10
Stage: Agent Post-train
Location: London, United Kingdom
Role: Mid · Applied
Function: Engineering
Domain: consumer
Team: Alexa for Shopping Science
Maturity: Scaling

Skills

Agents & Autonomy

Agentic SystemsContext EngineeringMulti-Agent Systems

Applied ML Domains

Data Science

Data Engineering

Data Pipelines

General Experience & Skills

Data-Driven Decision Making

LLM & Foundation Models

LLM Evaluation & GradingLarge Language Models (LLMs)Prompt EngineeringRetrieval-Augmented Generation (RAG)

ML Ops & Evaluation

A/B TestingFine-TuningLLM EvaluationModel Lifecycle Management

ML Techniques

Few-Shot LearningLanguage ModelingMachine LearningSupervised Fine-Tuning (SFT)

Math & Foundations

Causal Inference

NLP & Language

Conversational AINatural Language Processing

Research & Credentials

Published Research

Read full job description

We are looking for a passionate, talented, and inventive Applied Scientist with a strong machine learning background to help build industry-leading language technology powering Alexa for Shopping, our AI-driven search and shopping assistant, helping customers with their shopping tasks at every step of their shopping journey.

This innovative role focuses on developing and optimizing language model-powered (LLM/SLM) conversational experiences. The core emphasis is to get the best performance out LLMs/SLMs via careful and methodical instruction design, contextual grounding, informed choices of MCP tools and agent/multi-agent systems, context engineering, model fine-tuning, evaluation frameworks, and experimentation to systematically improve quality, robustness, and customer impact. The work combines scientific rigor with product intuition to systematically raise the bar for conversational AI performance at Amazon scale.

Our mission in conversational shopping is to make it easy for customers to find and discover the best products to meet their needs by helping with their product research, providing comparisons and recommendations, answering product questions, enabling shopping directly from images or videos, providing visual inspiration, and more. We do this by leveraging advanced analytics, Natural Language Processing (NLP), Machine Learning (ML), A/B testing, causal inference, and data-driven insights to continuously improve our systems.

Key job responsibilities As an Applied Scientist on our team, you will develop and maintain LLM agents, including automated eval pipelines, LLM-as-a-judge methodologies, rubric design, and dataset curation to measure nuanced aspects of response quality.

You will partner with the wider org to experiment with techniques such as retrieval augmentation, context enrichment, prompt decomposition, and model fine-tuning or post-training strategies, if and when applicable. Where latency and cost constraints demand it, you will lead post-training of small language models (SLMs) — including supervised fine-tuning, preference optimisation, and distillation — to deliver low-latency conversational and shopping experiences.

You will apply applied machine learning and deep learning techniques as last-mile improvements to shopping experiences, that might span ranking, relevance, personalisation, and multimodal understanding. You will design and evaluate agentic architectures that balance the needs of diverse shopping use cases, making principled choices across paradigms such as single-agent and multi-agent systems, memory management strategies, and tool orchestration to optimise for quality, latency, and reliability at scale. You will leverage petabytes of data and identify opportunities to leverage machine learning models aimed at making conversational systems more performant.

A day in the life

Perform hands-on analysis of large-scale multimodal interaction datasets to develop insights into how customers engage with conversational AI systems and how to improve response quality and customer experience.
Use statistical methods, experimentation, and data-driven analysis to develop scalable approaches for measuring, evaluating, and optimizing large language model (LLM)-based shopping assistant systems, leveraging structured and unstructured contextual signals.
Conduct deep-dive analyses to identify opportunities for improving conversational relevance, grounding, customer satisfaction, and downstream business impact.
Collaborate with Product management and Engineers to translate analytical insights into production systems, working closely on model evaluation and deployment.
Communicate results and insights to both technical and non-technical audiences, including through presentations, written reports, and data visualizations.

About the team The Alexa for Shopping Science team, based in London, works alongside ~150 engineers, designers and product managers, shaping the future of AI-driven shopping experiences at Amazon. The team works on every aspect of the conversational AI system, from making it agentic, enabling customers to set price alerts or empower the assistant to act on their behalf and automatically purchase products when the price is right, to understanding multimodal user queries and generating answers that combine text, image, audio and video, including deep research reports that scour the web and the Amazon catalog to provide detailed and personalised shopping guidance. We utilize and advance state-of-art techniques in the fields of Natural Language Processing, gen AI, Information Retrieval, Machine/Deep Learning, and Data Mining. We validate our work by actively participating in the internal and external scientific communities.

Basic Qualifications

PhD, or a Master's degree and experience in CS, CE, ML or related field
Experience in state-of-the-art deep learning models architecture design and deep learning training and optimization and model pruning
Experience in patents or publications at top-tier peer-reviewed conferences or journals
Experience programming in Java, C++, Python or related language
Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing
Experience in building machine learning models for business application

Preferred Qualifications

Experts in hands-on large language model post-training and in-depth understanding in the algorithms including both supervised fine-tuning and large scale reinforcement learning, especially for large scale distributed training - Strong publication records in top-tier NLP/LLM conferences as NeuRIPS, ICLR, ICML, EMNLP, ACL, NAACL with 500+ citations.

Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice (https://www.amazon.jobs/en/privacy_page) to know more about how we collect, use and transfer the personal data of our candidates.

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

A day in the life

Perform hands-on analysis of large-scale multimodal interaction datasets to develop insights into how customers engage with conversational AI systems and how to improve response quality and customer experience.
Use statistical methods, experimentation, and data-driven analysis to develop scalable approaches for measuring, evaluating, and optimizing large language model (LLM)-based shopping assistant systems, leveraging structured and unstructured contextual signals.
Conduct deep-dive analyses to identify opportunities for improving conversational relevance, grounding, customer satisfaction, and downstream business impact.
Collaborate with Product management and Engineers to translate analytical insights into production systems, working closely on model evaluation and deployment.
Communicate results and insights to both technical and non-technical audiences, including through presentations, written reports, and data visualizations.

Basic Qualifications

PhD, or a Master's degree and experience in CS, CE, ML or related field
Experience in state-of-the-art deep learning models architecture design and deep learning training and optimization and model pruning
Experience in patents or publications at top-tier peer-reviewed conferences or journals
Experience programming in Java, C++, Python or related language
Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing
Experience in building machine learning models for business application

Preferred Qualifications

Experts in hands-on large language model post-training and in-depth understanding in the algorithms including both supervised fine-tuning and large scale reinforcement learning, especially for large scale distributed training - Strong publication records in top-tier NLP/LLM conferences as NeuRIPS, ICLR, ICML, EMNLP, ACL, NAACL with 500+ citations.

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.