Data Scientist, Aws Quick Data

Amazon Amazon · Big Tech · Santa Clara, CA · Data Science

The Data Scientist II will focus on developing evaluation and benchmarking datasets for enterprise AI features, specifically for Amazon Quick Suite. This involves leveraging Generative AI techniques, LLMs for synthetic data generation, and LLM-as-a-judge settings to assess model performance, ensure data quality, and contribute to Responsible AI initiatives. The role also includes building scalable data pipelines and tools for continuous evaluation.

What you'd actually do

  1. Design and develop comprehensive evaluation and benchmarking datasets for Quick Suite AI-powered features
  2. Leverage LLMs for synthetic data corpora generation; data evaluation and quality assessment using LLM-as-a-judge settings
  3. Create ground truth datasets with high-quality question-answer pairs across diverse domains and use cases
  4. Lead human annotation initiatives and model evaluation audits to ensure data quality and relevance
  5. Develop and refine annotation guidelines and quality frameworks for evaluation tasks

Skills

Required

  • 2+ years of data scientist experience
  • 3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
  • 3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
  • 1+ years of working with or evaluating AI systems experience
  • 1+ years of creating or contributing to mathematical textbooks, research papers, or educational content experience
  • Master's degree in Science, Technology, Engineering, or Mathematics (STEM), or experience working in Science, Technology, Engineering, or Mathematics (STEM)
  • Experience applying theoretical models in an applied environment

Nice to have

  • Ph.D. in Science, Technology, Engineering, or Mathematics (STEM)
  • Knowledge of machine learning concepts and their application to reasoning and problem-solving
  • Experience in a ML or data scientist role with a large technology company
  • Experience in defining and creating benchmarks for assessing GenAI model performance
  • Experience working on multi-team, cross-disciplinary projects
  • Experience applying quantitative analysis to solve business problems and making data-driven business decisions
  • Experience effectively communicating complex concepts through written and verbal communication

What the JD emphasized

  • evaluation and benchmarking datasets
  • Responsible AI
  • LLM-as-a-judge

Other signals

  • evaluation and benchmarking datasets
  • Generative AI
  • Responsible AI
  • LLMs for synthetic data generation
  • LLM-as-a-judge