AI Benchmarking Specialist, Sp Support - German, International Seller Growth

Amazon Amazon · Big Tech · IN, KA, Bengaluru · Editorial, Writing, & Content Management

This role focuses on evaluating AI systems, specifically LLMs, by designing and executing benchmarking and audit activities. The core responsibilities include assessing model quality, compliance, robustness, and fairness, as well as handling annotations for training and measuring AI models. The role also involves preparing audit reports and ensuring data quality.

What you'd actually do

  1. Assist in planning and executing benchmarking exercises for AI models, including defining test plans, metrics, and acceptance criteria across accuracy, robustness, bias, and reliability
  2. Support content accuracy, relevancy, and privacy checks by reviewing datasets, model outputs, and data handling practices, escalating potential regulatory risks.
  3. Validate data based on specific annotation guidelines, ensuring the accuracy and quality of the collected information
  4. Prepare clear audit and benchmarking reports, including error ratings, root-cause analysis, and recommendations, and contribute to presentations for senior stakeholders
  5. Maintain organized audit documentation, evidence, and benchmarking datasets to support internal review

Skills

Required

  • Speak, write, and read fluently in German

Nice to have

  • Experience with machine learning models

What the JD emphasized

  • handling annotations for training, measuring, and improving Artificial Intelligence (AI) and Large Language Models (LLMs)
  • evaluating AI systems
  • benchmarking and audit activities
  • assess model quality, compliance, robustness, and fairness
  • potential regulatory risks

Other signals

  • evaluating AI systems
  • benchmarking and audit activities
  • assess model quality, compliance, robustness, and fairness
  • handling annotations for training, measuring, and improving AI and LLMs