Applied Scientist - Llm, Alexa Conversational Modelling Intelligence

Amazon Amazon · Big Tech · DE, Belgium +1 · Machine Learning Science

Applied Scientist II in Alexa Conversational Modelling Intelligence team focused on LLM post-training (SFT, RLHF, preference optimization) for Alexa+. Drives model development from data curation through training, evaluation, and deployment. Builds evaluation frameworks, diagnoses defects, and iterates on recipes. Collaborates with scientists and engineers, contributes to tooling, and publishes research. Aims to improve customer experience for millions.

What you'd actually do

  1. own the end-to-end model development lifecycle for LLMs that power Alexa+
  2. design and execute training recipes — including supervised fine-tuning, reinforcement learning from human feedback, and preference optimization — iterating rapidly on data, hyperparameters, and architectures to move quality and efficiency metrics.
  3. build robust evaluation frameworks to measure model performance, diagnose failure modes, and quantify improvements.
  4. collaborate closely with research scientists and engineers to bring models from experimentation to production at scale.
  5. advance the state of the art by publishing findings at top-tier NLP/ML venues (ACL, EMNLP, NeurIPS, ICML, ICLR), ensuring your research drives both customer impact and scientific contribution.

Skills

Required

  • PhD in computer science, machine learning, engineering, or related fields
  • Knowledge of at least one programming language such as Java, C#, JavaScript, Python, Ruby or Perl
  • Experience in designing experiments and statistical analysis of results
  • Hands-on experience building, training, and evaluating LLMs.

Nice to have

  • Experience working with large, complex data sets
  • Experience working effectively with science, data processing, and software engineering teams
  • Experience in written and verbal communication skills to communicate with technical and non-technical audiences, including senior leadership
  • Experience building and deploying LLM solutions in production or at scale.
  • Hands-on experience with Large Language Models training and fine-tuning via pre-training, SFT, and/or RLHF/preference optimization.
  • Experience with LLM evaluation — building benchmarks, LLM-as-a-judge, or defect/quality analysis.
  • Familiarity with modern training/inference infrastructure (e.g., distributed training, RL frameworks, model serving).

What the JD emphasized

  • publish at top-tier NLP/ML conferences
  • publishing at top-tier NLP/ML conferences
  • publications on top-tier conferences

Other signals

  • LLM post-training
  • supervised fine-tuning
  • RLHF
  • preference optimization
  • evaluation frameworks
  • defect analysis
  • publishing at top-tier NLP/ML conferences