AI Benchmarking Lead, Performance Benchmarking Evaluation

Amazon Amazon · Big Tech · IN, TS, Hyderabad · Applied Science

This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers.

What you'd actually do

  1. Evaluate audits performed by the core auditing team to increase confidence in evaluation metrics
  2. Improve audit reliability and consistency through systematic measurement of auditor accuracy
  3. Conduct targeted calibration to ensure quality standards across the auditing function
  4. Enforce quality standards by quality-checking audits and providing actionable feedback to team members
  5. Drive continuous improvement in audit processes and methodologies.

Skills

Required

  • Experience in natural language data labeling, data annotation, linguistic annotation or other forms of data markup
  • Proficiency in MS Excel
  • basic understanding of SQL and Python
  • Experience with Microsoft Office products and applications
  • Strong verbal and written communication skills in English
  • Knowledge about SOA and process that deal with sellers.

Nice to have

  • 1 to 3 years of equivalent experience
  • Performed annotation related tasks across ML data process areas.
  • Strong knowledge of process documentation, analysis knowledge
  • Technical proficiency in SQL querying and Python programming for data analysis
  • Strong analytical and problem-solving skills
  • Ability to work independently and as part of a team

What the JD emphasized

  • scale from 61% to 90%+ active seller coverage worldwide

Other signals

  • Ensuring the reliability and accuracy of AI model evaluations as we scale from 61% to 90%+ active seller coverage worldwide.
  • Benchmark Seller Assistant AI models for relevancy, correctness, and completeness.
  • Improve audit reliability and consistency through systematic measurement of auditor accuracy
  • Enforce quality standards by quality-checking audits and providing actionable feedback to team members
  • Drive continuous improvement in audit processes and methodologies.