Currently tracking 1110 active AI roles, down 16% versus the prior 4 weeks. Primary focus: Agent · Engineering. Salary range $65k–$465k (avg $194k).
Amazon has 1472 active AI-related job listings. The company is heavily focused on roles within the "agents" stage, which accounts for 38% of its AI hiring, followed by "application" at 26%. Engineering is the dominant function, with 1172 positions. Over the last 30 days, Amazon has added 667 new AI roles, representing a 74% increase compared to the previous 30-day period. Frequent tech tags include agent_orchestration, model_serving, and multimodal.
Amazon currently has 1573 active AI-related roles in our index. The most common open titles are: ML Data Associate-II (9), 2026 Applied Scientist Intern, Amazon University Talent Acquisition (8), AI Data Associate (Dutch) , Artificial General Intelligence Data Services (8), Software Development Engineer, AWS (8), Senior Delivery Consultant - Data , Professional Services, AWSI HCLS (7). Most positions are in Engineering and Research.
Amazon's active AI hiring is concentrated in: agents (41%), application (26%), serving infrastructure (13%). These categories follow a seven-stage AI lifecycle: data, pre-training, post-training, serving infrastructure, agents, evaluation, and application.
Amazon is hiring AI talent in: United States (1023 roles), Canada (59 roles), United Kingdom (47 roles), India (23 roles).
Job postings at Amazon most frequently mention: Machine Learning, Generative AI, Large Language Models (LLMs), Software Engineering, Agentic Systems.
In the past 30 days, Amazon has posted 696 new AI-related roles.
| Title | Stage | AI score |
|---|---|---|
| AI Language Engineer, Alexa for Shopping AI Language Engineer for Amazon's Conversational Shopping team, focusing on developing and implementing LLM-assisted evaluation tools and processes to improve AI-driven shopping experiences. The role involves creating automated verification scripts, annotation guidelines, and quality metrics, collaborating with cross-functional teams to ensure high-quality editorial data and product outcomes. | Eval GateData | 8 |
| Applied Science Manager, Sponsored Products and Brands Manager for a Continuous Model Evaluation and Learning workstream within Amazon Ads' Sponsored Products and Brands team. The role involves leading a team of applied scientists and engineers to build and ship an evaluation and remediation framework for an agentic brand-intelligence system. This includes designing evaluation metrics, developing optimization engines for prompts and synthetic data, and ensuring offline-to-online consistency for quality improvements. The goal is to enable autonomous detect-diagnose-remediate loops to scale quality across brand skills. |
| Eval GateAgent |
| 8 |
| Data Scientist, AWS Quick Data The Data Scientist will focus on developing evaluation and benchmarking datasets for generative AI capabilities within the Amazon Quick Suite enterprise AI platform. This includes leveraging LLMs for synthetic data generation, creating ground truth datasets, leading human annotation initiatives, and contributing to Responsible AI efforts to ensure enterprise-readiness, safety, and effectiveness of AI at scale. | Eval GateData | 8 |
| Data Scientist, AWS Quick Data The Data Scientist II will focus on developing evaluation and benchmarking datasets for enterprise AI features, specifically for Amazon Quick Suite. This involves leveraging Generative AI techniques, LLMs for synthetic data generation, and LLM-as-a-judge settings to assess model performance, ensure data quality, and contribute to Responsible AI initiatives. The role also includes building scalable data pipelines and tools for continuous evaluation. | Eval GateData | 8 |
| Data Scientist, AWS Quick Data The Data Scientist will focus on developing evaluation and benchmarking datasets for generative AI capabilities within the Amazon Quick Suite enterprise AI platform. This includes leveraging LLMs for synthetic data generation, creating ground truth datasets, leading human annotation initiatives, and contributing to Responsible AI efforts to ensure enterprise-readiness, safety, and effectiveness of AI at scale. | Eval GateData | 8 |
| Data Scientist, Network Fabric Engineering Data Scientist role focused on defining and driving the data science strategy for network operations automation, including agentic systems. The role involves defining metrics, building risk and reliability models, and evaluating the performance of automation and AI systems to improve network availability. It emphasizes statistical rigor and evidence-based decision-making within a team of network and software engineers. | Eval GateAgent | 7 |
| Software Development Engineer 2 Software Development Engineer 2 for Amazon's Risk Management Team, focusing on evaluating bad actor risk across various entities. The role involves building self-service capabilities for ML model experimentation and managing risk use cases at a global scale, aiming to automate risk decisions and reduce fraud. | Eval GateAgent | 7 |
| Applied Scientist, Fauna This role focuses on developing evaluation frameworks and data collection protocols for robotic capabilities, bridging robotics, ML, and human-in-the-loop systems. The scientist will design evaluation methodologies, create data collection protocols, build teleoperation workflows, and analyze results to improve robot behavior and dataset generation. | Eval GateData | 7 |
| Software Development Engineer Test, Alexa Global quality Software Development Engineer in Test focused on quality assurance automation and framework creation for Alexa's global, multilingual, and multimodal experiences. The role involves building agentic automation tooling for end-to-end quality evaluation, including synthetic test generation, LLM-as-a-Judge, and visual/cultural validation using ML. | Eval GateAgent | 7 |
| Sr. Software Development Engineer, Automated Reasoning Group Senior Software Development Engineer role focused on applying Automated Reasoning to verify Generative AI outputs, specifically addressing hallucinations within AWS services. The role involves designing and building new services and capabilities at scale, contributing to the evolution of the Automated Reasoning Checks (ARc) service, and making automated reasoning more accessible within AWS. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| Senior Applied Scientist, Fauna Senior Applied Scientist role focused on developing evaluation frameworks and data collection protocols for robotic capabilities. The role involves designing how to measure, stress-test, and improve robot behavior, building infrastructure for teleoperation, evaluation, and learning, and analyzing results to identify performance gaps. It requires expertise in robotics, ML, and human-in-the-loop systems, with a focus on turning capability goals into measurable evaluation systems. | Eval GateAgent | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Specialist, SP Support - German, International Seller Growth This role focuses on evaluating AI systems, specifically LLMs, by designing and executing benchmarking and audit activities. The core responsibilities include assessing model quality, compliance, robustness, and fairness, as well as handling annotations for training and measuring AI models. The role also involves preparing audit reports and ensuring data quality. | Eval GateData | 7 |
| Software Development Manager, Agentic AI - AgentCore This role is for a Software Development Manager on the Agentic AI organization's Evaluations & Optimization team at AWS. The manager will lead a team of engineers to build systems for assessing the quality, performance, and reliability of GenAI and agentic systems, as well as optimization solutions. The work involves deep learning, distributed systems, and evaluation science, focusing on building infrastructure and tooling for evaluation workflows. | Eval GateAgent | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation The AI Benchmarking Lead will focus on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. This role involves benchmarking AI models, evaluating audit processes, improving audit consistency, and enforcing quality standards to support the scaling of the product to a wider seller base. | Eval Gate | 7 |
| Applied Science Manager, Artificial General Intelligence , Quality Automation Applied Science Manager for AGI team focusing on quality automation, auditing, and evaluation of LLMs and multimodal systems. Leads a team of scientists to develop quality strategies, auditing frameworks, and research new methodologies to ensure data integrity and model performance. Manages team development, cross-functional communication, and drives research into data impact and utility measurement for AI models. | Eval GatePost-train | 7 |