AI Benchmarking Specialist - Chinese, International Seller Growth

Amazon Amazon · Big Tech · 31, China +1 · Editorial, Writing, & Content Management

This role focuses on evaluating AI systems, specifically LLMs, by designing and executing benchmarking and audit activities to assess model quality, compliance, robustness, and fairness. It involves annotation for training, measuring, and improving AI models, preparing audit reports, and ensuring data quality. The role supports the Seller AI team in developing Gen-AI/LLM powered tools for sellers.

What you'd actually do

  1. Assist in planning and executing benchmarking exercises for AI models, including defining test plans, metrics, and acceptance criteria across accuracy, robustness, bias, and reliability
  2. Support content accuracy, relevancy, and privacy checks by reviewing datasets, model outputs, and data handling practices, escalating potential regulatory risks.
  3. Validate data based on specific annotation guidelines, ensuring the accuracy and quality of the collected information
  4. Prepare clear audit and benchmarking reports, including error ratings, root-cause analysis, and recommendations, and contribute to presentations for senior stakeholders
  5. Maintain organized audit documentation, evidence, and benchmarking datasets to support internal review

Skills

Required

  • AI benchmarking
  • model evaluation
  • data validation
  • report generation
  • stakeholder communication

Nice to have

  • Experience with machine learning models

What the JD emphasized

  • AI auditing
  • quality assurance
  • benchmarking
  • model quality
  • compliance
  • robustness
  • fairness
  • annotation

Other signals

  • AI benchmarking
  • model quality
  • compliance
  • robustness
  • fairness
  • AI auditing
  • quality assurance
  • annotation
  • LLM evaluation