AI Benchmarking Specialist, Sp Support - Italian, International Seller Growth

Amazon Amazon · Big Tech · IN, KA, Bengaluru · Editorial, Writing, & Content Management

This role focuses on evaluating AI systems, specifically LLMs, by designing and executing benchmarking and audit activities. It involves assessing model quality, compliance, robustness, and fairness, with a strong emphasis on handling annotations for training, measuring, and improving AI models. The role also includes preparing audit reports and ensuring data quality based on annotation guidelines.

What you'd actually do

  1. Assist in planning and executing benchmarking exercises for AI models, including defining test plans, metrics, and acceptance criteria across accuracy, robustness, bias, and reliability
  2. Support content accuracy, relevancy, and privacy checks by reviewing datasets, model outputs, and data handling practices, escalating potential regulatory risks.
  3. Validate data based on specific annotation guidelines, ensuring the accuracy and quality of the collected information
  4. Prepare clear audit and benchmarking reports, including error ratings, root-cause analysis, and recommendations, and contribute to presentations for senior stakeholders
  5. Maintain organized audit documentation, evidence, and benchmarking datasets to support internal review

Skills

Required

  • Speak, write, and read fluently in Italian

Nice to have

  • Experience with machine learning models

What the JD emphasized

  • handling annotations for training, measuring, and improving Artificial Intelligence (AI) and Large Language Models (LLMs)
  • evaluating AI systems
  • benchmarking and audit activities
  • assess model quality, compliance, robustness, and fairness
  • potential regulatory risks

Other signals

  • evaluating AI systems
  • benchmarking and audit activities
  • assess model quality, compliance, robustness, and fairness
  • annotations for training, measuring, and improving Artificial Intelligence (AI) and Large Language Models (LLMs)