Data Scientist, Aws Quick Data

Amazon Amazon · Big Tech · Santa Clara, CA · Data Science

The Data Scientist will focus on developing evaluation and benchmarking datasets for generative AI capabilities within the Amazon Quick Suite enterprise AI platform. This includes leveraging LLMs for synthetic data generation, creating ground truth datasets, leading human annotation initiatives, and contributing to Responsible AI efforts to ensure enterprise-readiness, safety, and effectiveness of AI at scale.

What you'd actually do

  1. Design and develop comprehensive evaluation and benchmarking datasets for Quick Suite AI-powered features
  2. Leverage LLMs for synthetic data corpora generation; data evaluation and quality assessment using LLM-as-a-judge settings
  3. Create ground truth datasets with high-quality question-answer pairs across diverse domains and use cases
  4. Lead human annotation initiatives and model evaluation audits to ensure data quality and relevance
  5. Develop and refine annotation guidelines and quality frameworks for evaluation tasks

Skills

Required

  • 2+ years of data scientist experience
  • 3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
  • 3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
  • 1+ years of working with or evaluating AI systems experience
  • 1+ years of creating or contributing to mathematical textbooks, research papers, or educational content experience
  • Master's degree in Science, Technology, Engineering, or Mathematics (STEM), or experience working in Science, Technology, Engineering, or Mathematics (STEM)
  • Experience applying theoretical models in an applied environment

Nice to have

  • Ph.D. in Science, Technology, Engineering, or Mathematics (STEM)
  • Knowledge of machine learning concepts and their application to reasoning and problem-solving
  • Experience in a ML or data scientist role with a large technology company
  • Experience in defining and creating benchmarks for assessing GenAI model performance
  • Experience working on multi-team, cross-disciplinary projects
  • Experience applying quantitative analysis to solve business problems and making data-driven business decisions
  • Experience effectively communicating complex concepts through written and verbal communication

What the JD emphasized

  • evaluation and benchmarking datasets
  • LLM-as-a-judge
  • Responsible AI initiatives
  • enterprise AI platform

Other signals

  • evaluation and benchmarking datasets
  • LLM-as-a-judge
  • Responsible AI initiatives
  • enterprise AI platform