AI Benchmarking Lead, Performance Benchmarking Evaluation

Amazon Amazon · Big Tech · IN, TS, Hyderabad · Applied Science

The AI Benchmarking Lead will focus on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. This role involves benchmarking AI models, evaluating audit processes, improving audit consistency, and enforcing quality standards to support the scaling of the product to a wider seller base.

What you'd actually do

  1. Evaluate audits performed by the core auditing team to increase confidence in evaluation metrics
  2. Improve audit reliability and consistency through systematic measurement of auditor accuracy
  3. Conduct targeted calibration to ensure quality standards across the auditing function
  4. Enforce quality standards by quality-checking audits and providing actionable feedback to team members
  5. Drive continuous improvement in audit processes and methodologies.

Skills

Required

  • natural language data labeling
  • data annotation
  • linguistic annotation
  • data markup
  • MS Excel
  • SQL
  • Python
  • Microsoft Office products and applications
  • Strong verbal and written communication skills in English
  • Knowledge about SOA and process that deal with sellers

Nice to have

  • 1 to 3 years of equivalent experience
  • Performed annotation related tasks across ML data process areas.
  • Strong knowledge of process documentation, analysis knowledge
  • Technical proficiency in SQL querying and Python programming for data analysis
  • Strong analytical and problem-solving skills
  • Ability to work independently and as part of a team

What the JD emphasized

  • scale from 61% to 90%+ active seller coverage worldwide
  • benchmark Seller Assistant AI models for relevancy, correctness, and completeness
  • Evaluate audits performed by the core auditing team to increase confidence in evaluation metrics
  • Improve audit reliability and consistency through systematic measurement of auditor accuracy
  • Enforce quality standards by quality-checking audits and providing actionable feedback to team members

Other signals

  • Ensuring the reliability and accuracy of AI model evaluations as we scale from 61% to 90%+ active seller coverage worldwide.
  • benchmark Seller Assistant AI models for relevancy, correctness, and completeness.
  • Evaluate audits performed by the core auditing team to increase confidence in evaluation metrics
  • Improve audit reliability and consistency through systematic measurement of auditor accuracy
  • Enforce quality standards by quality-checking audits and providing actionable feedback to team members