Data Scientist - Ii, Alexa Sensitive Content Intelligence

Amazon Amazon · Big Tech · IN, KA, Bengaluru · Data Science

The Data Scientist-II role on the Alexa Sensitive Content Intelligence (ASCI) team focuses on building AI safety systems for Alexa's next-generation AI-powered virtual assistant. This involves developing responsible AI (RAI) solutions to ensure LLMs provide safe and trustworthy responses, understanding nuanced human values, and maintaining customer trust. The role requires applying state-of-the-art Generative AI techniques to analyze data, run experiments, and optimize data for sensitive content detection and mitigation, working with LLMs and multimodal systems.

What you'd actually do

  1. Build AI safety systems that protect millions of Alexa customers every day.
  2. Ensure LLMs provide safe, trustworthy responses, building AI systems that understand nuanced human values across cultures, and maintaining customer trust at scale.
  3. Apply state-of-the-art Generative AI techniques to analyze how well our data represents human language and run experiments to gauge downstream interactions.
  4. Design and implement principled strategies for data optimization.
  5. Analyze and automate processes for collecting and annotating LLM inputs and outputs to assess data quality and measurement.

Skills

Required

  • 3+ years of data scientist experience
  • 3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
  • 3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
  • Experience applying theoretical models in an applied environment
  • Experience with big data: processing, filtering, and presenting large quantities (100K to Millions of rows) of data
  • NLP models (e.g. LSTM, LLMs, other transformer based models)
  • CV models (e.g. CNN, AlexNet, ResNet, GANs, ViT)

Nice to have

  • Experience in Python, Perl, or another scripting language
  • Experience diving into data to discover hidden patterns and of conducting error/deviation analysis

What the JD emphasized

  • responsible AI
  • LLMs
  • multimodal systems
  • data quality
  • measurement

Other signals

  • Responsible AI
  • LLMs
  • multimodal systems
  • data quality
  • measurement
  • sensitive content detection