Currently tracking 995 active AI roles, up 64% versus the prior 4 weeks. Primary focus: Agent · Engineering. Salary range $65k–$465k (avg $196k).
| Title | Stage | AI score |
|---|---|---|
| Applied Science Manager, Sponsored Products and Brands Manager for a Continuous Model Evaluation and Learning workstream within Amazon Ads' Sponsored Products and Brands team. The role involves leading a team of applied scientists and engineers to build and ship an evaluation and remediation framework for an agentic brand-intelligence system. This includes designing evaluation metrics, developing optimization engines for prompts and synthetic data, and ensuring offline-to-online consistency for quality improvements. The goal is to enable autonomous detect-diagnose-remediate loops to scale quality across brand skills. | Eval GateAgent | 8 |
| Data Scientist, AWS Quick Data The Data Scientist will focus on developing evaluation and benchmarking datasets for generative AI capabilities within the Amazon Quick Suite enterprise AI platform. This includes leveraging LLMs for synthetic data generation, creating ground truth datasets, leading human annotation initiatives, and contributing to Responsible AI efforts to ensure enterprise-readiness, safety, and effectiveness of AI at scale. |
| Eval GateData |
| 8 |
| Data Scientist, AWS Quick Data The Data Scientist II will focus on developing evaluation and benchmarking datasets for enterprise AI features, specifically for Amazon Quick Suite. This involves leveraging Generative AI techniques, LLMs for synthetic data generation, and LLM-as-a-judge settings to assess model performance, ensure data quality, and contribute to Responsible AI initiatives. The role also includes building scalable data pipelines and tools for continuous evaluation. | Eval GateData | 8 |
| Data Scientist, AWS Quick Data The Data Scientist will focus on developing evaluation and benchmarking datasets for generative AI capabilities within the Amazon Quick Suite enterprise AI platform. This includes leveraging LLMs for synthetic data generation, creating ground truth datasets, leading human annotation initiatives, and contributing to Responsible AI efforts to ensure enterprise-readiness, safety, and effectiveness of AI at scale. | Eval GateData | 8 |
| Sr. Software Development Engineer, Automated Reasoning Group Senior Software Development Engineer role focused on applying Automated Reasoning to verify Generative AI outputs, specifically addressing hallucinations within AWS services. The role involves designing and building new services and capabilities at scale, contributing to the evolution of the Automated Reasoning Checks (ARc) service, and making automated reasoning more accessible within AWS. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| Senior Applied Scientist, Fauna Senior Applied Scientist role focused on developing evaluation frameworks and data collection protocols for robotic capabilities. The role involves designing how to measure, stress-test, and improve robot behavior, building infrastructure for teleoperation, evaluation, and learning, and analyzing results to identify performance gaps. It requires expertise in robotics, ML, and human-in-the-loop systems, with a focus on turning capability goals into measurable evaluation systems. | Eval GateAgent | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Specialist, SP Support - German, International Seller Growth This role focuses on evaluating AI systems, specifically LLMs, by designing and executing benchmarking and audit activities. The core responsibilities include assessing model quality, compliance, robustness, and fairness, as well as handling annotations for training and measuring AI models. The role also involves preparing audit reports and ensuring data quality. | Eval GateData | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation The AI Benchmarking Lead will focus on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. This role involves benchmarking AI models, evaluating audit processes, improving audit consistency, and enforcing quality standards to support the scaling of the product to a wider seller base. | Eval Gate | 7 |
| Software Development Manager, Agentic AI - AgentCore This role is for a Software Development Manager on the Agentic AI organization's Evaluations & Optimization team at AWS. The manager will lead a team of engineers to build systems for assessing the quality, performance, and reliability of GenAI and agentic systems, as well as optimization solutions. The work involves deep learning, distributed systems, and evaluation science, focusing on building infrastructure and tooling for evaluation workflows. | Eval GateAgent | 7 |
| Applied Science Manager, Artificial General Intelligence , Quality Automation Applied Science Manager for AGI team focusing on quality automation, auditing, and evaluation of LLMs and multimodal systems. Leads a team of scientists to develop quality strategies, auditing frameworks, and research new methodologies to ensure data integrity and model performance. Manages team development, cross-functional communication, and drives research into data impact and utility measurement for AI models. | Eval GatePost-train | 7 |