Applied Scientist, Conversational Assistant Modeling and Learning

Amazon Amazon · Big Tech · Bellevue, WA · Machine Learning Science

Applied Scientist role at Amazon focusing on building Alexa+, an LLM-powered conversational assistant. Responsibilities include LLM fine-tuning, alignment, agentic reasoning, and evaluation pipelines. The role involves designing and implementing end-to-end systems, translating research into production, and publishing results. It operates at massive scale across multiple languages and device types.

What you'd actually do

  1. Improve the efficiency of LLM, VLM, and agent training and evaluation pipelines, including distributed training, inference serving, data loading, checkpointing, memory usage, and GPU utilization.
  2. Design, implement, and evaluate novel approaches to LLM fine-tuning, alignment (RLHF, DPO), and distillation for production deployment
  3. Architect agentic systems — multi-step reasoning, tool use, planning, and orchestration
  4. Develop evaluation frameworks and methodologies that go beyond standard benchmarks to capture real-world conversational quality
  5. Translate research advances into customer-facing products, working closely with engineering, product, and cross-functional science teams

Skills

Required

  • Master's degree or above in computer science, machine learning, engineering, or related fields
  • 3+ years of programming in Java, C++, Python or related language experience
  • 1+ years experience with distributed training frameworks such as VeRL, Megatron, FSDP, DeepSpeed, Ray, or similar systems, and inference engines such as vLLM, TensorRT-LLM, Triton, SGLang, TGI.
  • 3+ years’ experience with modeling languages and tools like PyTorch / TensorFlow, R, scikit-learn, numpy, scipy, etc.
  • Solid ML background and familiar with NLU, NLG, and LLM training and evaluation.

Nice to have

  • PhD in computer science, machine learning, engineering, or related fields
  • 3+ years experience with distributed training frameworks such as verl, Megatron, FSDP, DeepSpeed, Ray, or similar systems, and inference engines such as vLLM, TensorRT-LLM, Triton, SGLang, TGI.
  • Publications at peer-reviewed NLP/ML conferences (e.g. ACL, EMNLP, NAACL, NeurIPS, ICLR, ICML, AAAI, etc.)
  • Scientific thinking and the ability to invent, a track record of thought leadership and contributions that have advanced the field.

What the JD emphasized

  • end-to-end systems
  • production deployment
  • customer-facing products
  • massive scale
  • multilingual/multimodal understanding

Other signals

  • LLM fine-tuning
  • alignment
  • agentic reasoning
  • evaluation
  • customer-facing products