What you'd actually do

Define and frame new research problems in fraud detection where neither problem nor solution is well-defined.

Apply new machine learning approaches, models, and algorithms to detect sophisticated invalid traffic.

Apply domain knowledge to perform broad data analysis as a precursor to modeling and build business insights.

Work with unstructured and massive datasets to deliver results.

Produce research reports meeting top-tier external publication standards.

Skills

Required

2+ years of data scientist experience
3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
1+ years of guiding and coaching a group of researchers experience
1+ years of working with or evaluating AI systems experience
1+ years of creating or contributing to mathematical textbooks, research papers, or educational content experience
Master's degree in Science, Technology, Engineering, or Mathematics (STEM), or experience working in Science, Technology, Engineering, or Mathematics (STEM)
Experience applying theoretical models in an applied environment

Nice to have

Knowledge of machine learning concepts and their application to reasoning and problem-solving
Experience in Python, Perl, or another scripting language
Experience in a ML or data scientist role with a large technology company
Experience in defining and creating benchmarks for assessing GenAI model performance
Experience working on multi-team, cross-disciplinary projects
Experience applying quantitative analysis to solve business problems and making data-driven business decisions
Experience effectively communicating complex concepts through written and verbal communication

Other signals

detect sophisticated invalid traffic (IVT)

leveraging state-of-the-art techniques in deep learning and generative modeling

user behavior and multi-modal representation learning

anomaly detection, time-series analysis

sparse labeling methods

billions of ad events daily

novel algorithms that balance precision and recall

strict latency constraints

Amazon Ads is a multi-billion dollar global business that delivers advertising experiences across Amazon's owned-and-operated properties (including Prime Video, Twitch, Fire TV, and Amazon.com), third-party publisher networks, and emerging channels like generative AI-powered shopping experiences. As one of the fastest-growing segments of Amazon, we operate at unprecedented scale across desktop, mobile, connected TV, and emerging surfaces.

Within Amazon Ads, Traffic Quality is a critical pillar of advertiser trust and marketplace integrity. Our mission is to build advanced capabilities that work at petabyte scale to detect sophisticated invalid traffic (IVT) which includes sophisticated non-human traffic, bot networks, and fraudulent engagement patterns across programmatic advertising. We are on a journey to establish Amazon Ads as an industry leader in traffic quality standards and transparency. Our research agenda focuses on staying ahead of adversarial actors through continuous innovation in detection methodologies, leveraging state-of-the-art techniques in deep learning and generative modeling, user behavior and multi-modal representation learning, anomaly detection, time-series analysis, and sparse labeling methods. We process billions of ad events daily, developing novel algorithms that balance precision and recall while operating under strict latency constraints. Our work directly protects hundreds of millions of dollars in advertiser spend annually while maintaining a seamless user experience.

Key job responsibilities As a Data Scientist II in Traffic Quality, you will solve inherently hard problems in advertising fraud detection by applying advanced statistical techniques and machine learning. You'll work on systems that process billions of ad impressions and clicks per day, using Amazon's cloud services including EC2, S3, EMR, Sagemaker, and RedShift.

Define and frame new research problems in fraud detection where neither problem nor solution is well-defined.
Apply new machine learning approaches, models, and algorithms to detect sophisticated invalid traffic.
Apply domain knowledge to perform broad data analysis as a precursor to modeling and build business insights.
Work with unstructured and massive datasets to deliver results.
Produce research reports meeting top-tier external publication standards.
Mentor and develop junior scientists on the team.

About the team Here are a few papers published by the team: 1/ Scaling Generative Pre-training for User Ad Activity Sequences. AdKDD 2023. 2/ SLIDR: Real-time Robot Detection On Online Ads, IAAI 2023, Deployed Highly Innovative Applications of AI Track (AAAI 2023) 3/ Self-supervised Representation Learning Across Sequential and Tabular Features Using Transformers, NeurIPS 2022, First Table Representation Learning Workshop

Basic Qualifications

2+ years of data scientist experience
3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
1+ years of guiding and coaching a group of researchers experience
1+ years of working with or evaluating AI systems experience
1+ years of creating or contributing to mathematical textbooks, research papers, or educational content experience
Master's degree in Science, Technology, Engineering, or Mathematics (STEM), or experience working in Science, Technology, Engineering, or Mathematics (STEM)
Experience applying theoretical models in an applied environment

Preferred Qualifications

Knowledge of machine learning concepts and their application to reasoning and problem-solving
Experience in Python, Perl, or another scripting language
Experience in a ML or data scientist role with a large technology company
Experience in defining and creating benchmarks for assessing GenAI model performance
Experience working on multi-team, cross-disciplinary projects
Experience applying quantitative analysis to solve business problems and making data-driven business decisions
Experience effectively communicating complex concepts through written and verbal communication

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Define and frame new research problems in fraud detection where neither problem nor solution is well-defined.
Apply new machine learning approaches, models, and algorithms to detect sophisticated invalid traffic.
Apply domain knowledge to perform broad data analysis as a precursor to modeling and build business insights.
Work with unstructured and massive datasets to deliver results.
Produce research reports meeting top-tier external publication standards.
Mentor and develop junior scientists on the team.

Basic Qualifications

2+ years of data scientist experience
3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
1+ years of guiding and coaching a group of researchers experience
1+ years of working with or evaluating AI systems experience
1+ years of creating or contributing to mathematical textbooks, research papers, or educational content experience
Master's degree in Science, Technology, Engineering, or Mathematics (STEM), or experience working in Science, Technology, Engineering, or Mathematics (STEM)
Experience applying theoretical models in an applied environment

Preferred Qualifications

Knowledge of machine learning concepts and their application to reasoning and problem-solving
Experience in Python, Perl, or another scripting language
Experience in a ML or data scientist role with a large technology company
Experience in defining and creating benchmarks for assessing GenAI model performance
Experience working on multi-team, cross-disciplinary projects
Experience applying quantitative analysis to solve business problems and making data-driven business decisions
Experience effectively communicating complex concepts through written and verbal communication

Data Scientist, Traffic Quality

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Basic Qualifications

Preferred Qualifications

Basic Qualifications

Preferred Qualifications