AI Benchmarking Specialist, Sp Support - Spanish, International Seller Growth

Amazon Amazon · Big Tech · IN, KA, Bengaluru · Editorial, Writing, & Content Management

This role focuses on evaluating AI systems, specifically LLMs, by designing and executing benchmarking and audit activities. It involves assessing model quality, compliance, robustness, and fairness, as well as handling annotations for training and measuring AI models. The role also includes preparing audit reports and ensuring data quality.

What you'd actually do

  1. Assist in planning and executing benchmarking exercises for AI models, including defining test plans, metrics, and acceptance criteria across accuracy, robustness, bias, and reliability
  2. Support content accuracy, relevancy, and privacy checks by reviewing datasets, model outputs, and data handling practices, escalating potential regulatory risks.
  3. Validate data based on specific annotation guidelines, ensuring the accuracy and quality of the collected information
  4. Prepare clear audit and benchmarking reports, including error ratings, root-cause analysis, and recommendations, and contribute to presentations for senior stakeholders
  5. Maintain organized audit documentation, evidence, and benchmarking datasets to support internal review

Skills

Required

  • Fluent Spanish (speak, read, write)

Nice to have

  • Experience with machine learning models

What the JD emphasized

  • Spanish
  • AI Benchmarking Specialist
  • AI auditing
  • quality assurance
  • traditional audit-style documentation
  • stakeholder communication
  • regulatory risks
  • annotation guidelines
  • audit documentation
  • benchmarking datasets
  • process efficiencies
  • automation
  • AI audit methodologies
  • checklists
  • test frameworks
  • regulations
  • best practices evolve
  • annotations for training
  • measuring
  • improving Artificial Intelligence (AI) and Large Language Models (LLMs)
  • seller experience
  • accuracy
  • robustness
  • bias
  • fairness
  • content accuracy
  • relevancy
  • privacy checks
  • data handling practices
  • quality of the collected information
  • error ratings
  • root-cause analysis
  • recommendations
  • senior stakeholders
  • internal review
  • team members
  • managers
  • drive process efficiencies
  • explore opportunities for automation
  • enhance the productivity and effectiveness of the data generation
  • contributing to the development and continuous improvement of AI audit methodologies
  • checklists
  • test frameworks
  • regulations
  • best practices evolve

Other signals

  • evaluating AI systems
  • benchmarking and audit activities
  • model quality, compliance, robustness, and fairness
  • annotations for training, measuring, and improving AI and LLMs