What you'd actually do

own the end-to-end model development lifecycle for LLMs that power Alexa+

design and execute training recipes — including supervised fine-tuning, reinforcement learning from human feedback, and preference optimization — iterating rapidly on data, hyperparameters, and architectures to move quality and efficiency metrics.

build robust evaluation frameworks to measure model performance, diagnose failure modes, and quantify improvements.

collaborate closely with research scientists and engineers to bring models from experimentation to production at scale.

advance the state of the art by publishing findings at top-tier NLP/ML venues (ACL, EMNLP, NeurIPS, ICML, ICLR), ensuring your research drives both customer impact and scientific contribution.

Skills

Required

PhD in computer science, machine learning, engineering, or related fields
Knowledge of at least one programming language such as Java, C#, JavaScript, Python, Ruby or Perl
Experience in designing experiments and statistical analysis of results
Hands-on experience building, training, and evaluating LLMs.

Nice to have

Experience working with large, complex data sets
Experience working effectively with science, data processing, and software engineering teams
Experience in written and verbal communication skills to communicate with technical and non-technical audiences, including senior leadership
Experience building and deploying LLM solutions in production or at scale.
Hands-on experience with Large Language Models training and fine-tuning via pre-training, SFT, and/or RLHF/preference optimization.
Experience with LLM evaluation — building benchmarks, LLM-as-a-judge, or defect/quality analysis.
Familiarity with modern training/inference infrastructure (e.g., distributed training, RL frameworks, model serving).

As an Applied Scientist II in the Alexa Conversational Modelling Intelligence team within Alexa AI, you will drive model post-training for Large Language Models that power Alexa+. You'll adopt and adapt state-of-the-art techniques — including supervised fine-tuning, RLHF, and preference optimization — running rigorous experiments and translating findings into production-ready solutions that directly improve the customer experience for millions of users worldwide.

You will own the full model development cycle from data curation through training, evaluation, and deployment. Your day-to-day will involve developing evaluation methods and metrics, diagnosing model defects, and iterating on recipes to move concrete quality and efficiency benchmarks. You'll write clean, reproducible code, contribute to shared tooling, and collaborate closely with scientists and engineers to bring models from experimentation to scale.

You are technically curious, experiment-driven, and motivated by real customer impact. You will also advance the state of the art by publishing at top-tier NLP/ML conferences (ACL, EMNLP, NeurIPS, ICML, ICLR) — contributing to the broader research community while grounding your work in measurable outcomes.

Key job responsibilities As an Applied Scientist II in the Alexa Conversational Modelling Intelligence team, you will own the end-to-end model development lifecycle for LLMs that power Alexa+. You'll design and execute training recipes — including supervised fine-tuning, reinforcement learning from human feedback, and preference optimization — iterating rapidly on data, hyperparameters, and architectures to move quality and efficiency metrics. Your work will directly shape how millions of customers interact with Alexa daily.

You will build robust evaluation frameworks to measure model performance, diagnose failure modes, and quantify improvements. This includes developing benchmarks, implementing LLM-as-a-judge pipelines, and conducting rigorous defect analysis to identify where models fall short and why. You'll translate these insights into targeted improvements that close gaps in conversational quality, safety, and fluency.

You will collaborate closely with research scientists and engineers to bring models from experimentation to production at scale. You'll contribute to shared tooling and infrastructure, write clean and reproducible code, and document your methods so the team can build on your work. You are also expected to advance the state of the art by publishing findings at top-tier NLP/ML venues (ACL, EMNLP, NeurIPS, ICML, ICLR), ensuring your research drives both customer impact and scientific contribution.

A day in the life As an Applied Scientist II, your day will involve launching and monitoring training runs, analyzing experiment results, and iterating on model recipes based on evaluation data. You'll participate in science reviews with fellow researchers, sync with engineering partners on deployment readiness, and deep-dive into model outputs to understand behavioral patterns. You'll balance hands-on experimentation with collaborative problem-solving — working across the Alexa AI organization to align model improvements with customer-facing goals and product priorities.

About the team The Alexa Conversational Modelling Intelligence team builds industry-leading LLM-based conversational technologies that customers love. Our mission is to push the envelope in LLMs for Alexa to deliver the best-possible customer experience. As an Applied Scientist, you'll contribute directly to that mission through model development and experimentation.

Basic Qualifications

PhD in computer science, machine learning, engineering, or related fields
Knowledge of at least one programming language such as Java, C#, JavaScript, Python, Ruby or Perl
Experience in designing experiments and statistical analysis of results
Hands-on experience building, training, and evaluating LLMs.

Preferred Qualifications

Have publications on top-tier conferences, such as CVPR, ICCV, ECCV or NeurIPS
Experience working with large, complex data sets
Experience working effectively with science, data processing, and software engineering teams
Experience in written and verbal communication skills to communicate with technical and non-technical audiences, including senior leadership
Experience building and deploying LLM solutions in production or at scale.
Hands-on experience with Large Language Models training and fine-tuning via pre-training, SFT, and/or RLHF/preference optimization.
Experience with LLM evaluation — building benchmarks, LLM-as-a-judge, or defect/quality analysis.
Familiarity with modern training/inference infrastructure (e.g., distributed training, RL frameworks, model serving).

Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice (https://www.amazon.jobs/en/privacy_page) to know more about how we collect, use and transfer the personal data of our candidates.

m/w/d

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Basic Qualifications

PhD in computer science, machine learning, engineering, or related fields
Knowledge of at least one programming language such as Java, C#, JavaScript, Python, Ruby or Perl
Experience in designing experiments and statistical analysis of results
Hands-on experience building, training, and evaluating LLMs.

Preferred Qualifications

Have publications on top-tier conferences, such as CVPR, ICCV, ECCV or NeurIPS
Experience working with large, complex data sets
Experience working effectively with science, data processing, and software engineering teams
Experience in written and verbal communication skills to communicate with technical and non-technical audiences, including senior leadership
Experience building and deploying LLM solutions in production or at scale.
Hands-on experience with Large Language Models training and fine-tuning via pre-training, SFT, and/or RLHF/preference optimization.
Experience with LLM evaluation — building benchmarks, LLM-as-a-judge, or defect/quality analysis.
Familiarity with modern training/inference infrastructure (e.g., distributed training, RL frameworks, model serving).

m/w/d

Applied Scientist - Llm, Alexa Conversational Modelling Intelligence

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Basic Qualifications

Preferred Qualifications

Basic Qualifications

Preferred Qualifications