What you'd actually do

Design and develop comprehensive evaluation and benchmarking datasets for Quick Suite AI-powered features

Leverage LLMs for synthetic data corpora generation; data evaluation and quality assessment using LLM-as-a-judge settings

Create ground truth datasets with high-quality question-answer pairs across diverse domains and use cases

Lead human annotation initiatives and model evaluation audits to ensure data quality and relevance

Develop and refine annotation guidelines and quality frameworks for evaluation tasks

Skills

Required

2+ years of data scientist experience
3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
1+ years of working with or evaluating AI systems experience
1+ years of creating or contributing to mathematical textbooks, research papers, or educational content experience
Master's degree in Science, Technology, Engineering, or Mathematics (STEM), or experience working in Science, Technology, Engineering, or Mathematics (STEM)
Experience applying theoretical models in an applied environment

Nice to have

Ph.D. in Science, Technology, Engineering, or Mathematics (STEM)
Knowledge of machine learning concepts and their application to reasoning and problem-solving
Experience in a ML or data scientist role with a large technology company
Experience in defining and creating benchmarks for assessing GenAI model performance
Experience working on multi-team, cross-disciplinary projects
Experience applying quantitative analysis to solve business problems and making data-driven business decisions
Experience effectively communicating complex concepts through written and verbal communication

Amazon Quick Suite is an enterprise AI platform that transforms how organizations work with their data and knowledge. Combining generative AI-powered search, deep research capabilities, intelligent agents and automations, and comprehensive business intelligence, Quick Suite serves tens of thousands of users. Our platform processes thousands of queries monthly, helping teams make faster, data-driven decisions while maintaining enterprise-grade security and governance. From natural language interactions with complex datasets to automated workflows and custom AI agents, Quick Suite is redefining workplace productivity at unprecedented scale.

We are seeking a Data Scientist II to join our Quick Data team, focusing on evaluation and benchmarking data development for Quick Suite features, with particular emphasis on Research and other generative AI capabilities. Our mission is to engineer high-quality datasets that are essential to the success of Amazon Quick Suite. From human evaluations and Responsible AI safeguards to Retrieval-Augmented Generation and beyond, our work ensures that Generative AI is enterprise-ready, safe, and effective for users at scale. As part of our diverse team—including data scientists, engineers, language engineers, linguists, and program managers—you will collaborate closely with science, engineering, and product teams. We are driven by customer obsession and a commitment to excellence.

Key job responsibilities In this role, you will leverage data-centric AI principles to assess the impact of data on model performance and the broader machine learning pipeline. You will apply Generative AI techniques to evaluate how well our data represents human language and conduct experiments to measure downstream interactions. Specific responsibilities include:

Design and develop comprehensive evaluation and benchmarking datasets for Quick Suite AI-powered features
Leverage LLMs for synthetic data corpora generation; data evaluation and quality assessment using LLM-as-a-judge settings
Create ground truth datasets with high-quality question-answer pairs across diverse domains and use cases
Lead human annotation initiatives and model evaluation audits to ensure data quality and relevance
Develop and refine annotation guidelines and quality frameworks for evaluation tasks
Conduct statistical analysis to measure model performance, identify failure patterns, and guide improvement strategies
Collaborate with ML scientists and engineers to translate evaluation insights into actionable product improvements
Build scalable data pipelines and tools to support continuous evaluation and benchmarking efforts
Contribute to Responsible AI initiatives by developing safety and fairness evaluation datasets

About the team

Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.

Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness.

Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.

Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.

Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.

Basic Qualifications

2+ years of data scientist experience
3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
1+ years of working with or evaluating AI systems experience
1+ years of creating or contributing to mathematical textbooks, research papers, or educational content experience
Master's degree in Science, Technology, Engineering, or Mathematics (STEM), or experience working in Science, Technology, Engineering, or Mathematics (STEM)
Experience applying theoretical models in an applied environment

Preferred Qualifications

Ph.D. in Science, Technology, Engineering, or Mathematics (STEM)
Knowledge of machine learning concepts and their application to reasoning and problem-solving
Experience in a ML or data scientist role with a large technology company
Experience in defining and creating benchmarks for assessing GenAI model performance
Experience working on multi-team, cross-disciplinary projects
Experience applying quantitative analysis to solve business problems and making data-driven business decisions
Experience effectively communicating complex concepts through written and verbal communication

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, CA, Santa Clara - 157,300.00 - 212,800.00 USD annually USA, NY, New York - 153,400.00 - 207,500.00 USD annually USA, WA, Seattle - 136,000.00 - 184,000.00 USD annually

Design and develop comprehensive evaluation and benchmarking datasets for Quick Suite AI-powered features
Leverage LLMs for synthetic data corpora generation; data evaluation and quality assessment using LLM-as-a-judge settings
Create ground truth datasets with high-quality question-answer pairs across diverse domains and use cases
Lead human annotation initiatives and model evaluation audits to ensure data quality and relevance
Develop and refine annotation guidelines and quality frameworks for evaluation tasks
Conduct statistical analysis to measure model performance, identify failure patterns, and guide improvement strategies
Collaborate with ML scientists and engineers to translate evaluation insights into actionable product improvements
Build scalable data pipelines and tools to support continuous evaluation and benchmarking efforts
Contribute to Responsible AI initiatives by developing safety and fairness evaluation datasets

About the team

Basic Qualifications

2+ years of data scientist experience
3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
1+ years of working with or evaluating AI systems experience
1+ years of creating or contributing to mathematical textbooks, research papers, or educational content experience
Master's degree in Science, Technology, Engineering, or Mathematics (STEM), or experience working in Science, Technology, Engineering, or Mathematics (STEM)
Experience applying theoretical models in an applied environment

Preferred Qualifications

Ph.D. in Science, Technology, Engineering, or Mathematics (STEM)
Knowledge of machine learning concepts and their application to reasoning and problem-solving
Experience in a ML or data scientist role with a large technology company
Experience in defining and creating benchmarks for assessing GenAI model performance
Experience working on multi-team, cross-disciplinary projects
Experience applying quantitative analysis to solve business problems and making data-driven business decisions
Experience effectively communicating complex concepts through written and verbal communication

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

USA, CA, Santa Clara - 157,300.00 - 212,800.00 USD annually USA, NY, New York - 153,400.00 - 207,500.00 USD annually USA, WA, Seattle - 136,000.00 - 184,000.00 USD annually

Data Scientist, Aws Quick Data

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Basic Qualifications

Preferred Qualifications

Basic Qualifications

Preferred Qualifications