Data Scientist - Ii, Alexa Sensitive Co… at Amazon

What you'd actually do

Build AI safety systems that protect millions of Alexa customers every day.

Ensure LLMs provide safe, trustworthy responses, building AI systems that understand nuanced human values across cultures, and maintaining customer trust at scale.

Apply state-of-the-art Generative AI techniques to analyze how well our data represents human language and run experiments to gauge downstream interactions.

Design and implement principled strategies for data optimization.

Analyze and automate processes for collecting and annotating LLM inputs and outputs to assess data quality and measurement.

Skills

Required

3+ years of data scientist experience
3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
Experience applying theoretical models in an applied environment
Experience with big data: processing, filtering, and presenting large quantities (100K to Millions of rows) of data
NLP models (e.g. LSTM, LLMs, other transformer based models)
CV models (e.g. CNN, AlexNet, ResNet, GANs, ViT)

Nice to have

Experience in Python, Perl, or another scripting language
Experience diving into data to discover hidden patterns and of conducting error/deviation analysis

Alexa+ is Amazon’s next-generation, AI-powered virtual assistant. Building on the original Alexa, it uses generative AI to deliver a more conversational, personalised, and effective experience. Alexa Sensitive Content Intelligence (ASCI) team is developing responsible AI (RAI) solutions for Alexa+, empowering it to provide useful information responsibly.

The Mission Build AI safety systems that protect millions of Alexa customers every day. As conversational AI evolves, you'll solve challenging problems in Responsible AI by ensuring LLMs provide safe, trustworthy responses, building AI systems that understand nuanced human values across cultures, and maintaining customer trust at scale.

We are looking for a passionate, talented, and inventive Data Scientist-II to help build industry-leading technology with Large Language Models (LLMs) and multimodal systems, requiring good learning and generative models knowledge. You will be working with a team of exceptional Data Scientists working in a hybrid, fast-paced organization where scientists, engineers, and product managers work together to build customer facing experiences. You will collaborate with other data scientists while understanding the role data plays in developing data sets and exemplars that meet customer needs. You will analyze and automate processes for collecting and annotating LLM inputs and outputs to assess data quality and measurement.

You will apply state-of-the-art Generative AI techniques to analyze how well our data represents human language and run experiments to gauge downstream interactions. You will work collaboratively with other data scientists and applied scientists to design and implement principled strategies for data optimization.

Key job responsibilities A Data Scientist-II should have a reasonably good understanding of NLP models (e.g. LSTM, LLMs, other transformer based models) or CV models (e.g. CNN, AlexNet, ResNet, GANs, ViT) and know of ways to improve their performance using data. You leverage your technical expertise in improving and extending existing models. Your work will directly impact our customers in the form of products and services that make use of speech, language, and computer vision technologies.

You will be joining a select group of people making history producing one of the most highly rated products in Amazon's history, so if you are looking for a challenging and innovative role where you can solve important problems while growing in your career, this may be the place for you.

A day in the life You will be working with a group of talented scientists on running experiments to test scientific proposal/solutions to improve our sensitive contents detection and mitigation for worldwide coverage. This will involve collaboration with partner teams including engineering, PMs, data annotators, and other scientists to discuss data quality, policy, model development, and solution implementation. You will work with other scientists, collaborating and contributing to extending and improving solutions for the team.

About the team Our team pioneers Responsible AI for conversational assistants. We ensure Alexa delivers safe, trustworthy experiences across all devices, modalities, and languages worldwide. We work on frontier AI safety challenges—and we're looking for scientists who want to help shape the future of trustworthy AI.

Basic Qualifications

3+ years of data scientist experience
3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
Experience applying theoretical models in an applied environment
Experience with big data: processing, filtering, and presenting large quantities (100K to Millions of rows) of data

Preferred Qualifications

Experience in Python, Perl, or another scripting language
Experience diving into data to discover hidden patterns and of conducting error/deviation analysis

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Basic Qualifications

3+ years of data scientist experience
3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
Experience applying theoretical models in an applied environment
Experience with big data: processing, filtering, and presenting large quantities (100K to Millions of rows) of data

Preferred Qualifications

Experience in Python, Perl, or another scripting language
Experience diving into data to discover hidden patterns and of conducting error/deviation analysis

Data Scientist - Ii, Alexa Sensitive Content Intelligence

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Basic Qualifications

Preferred Qualifications

Basic Qualifications

Preferred Qualifications