Currently tracking 995 active AI roles, up 64% versus the prior 4 weeks. Primary focus: Agent · Engineering. Salary range $65k–$465k (avg $196k).
| Title | Stage | AI score |
|---|---|---|
| Applied Scientist II, Alexa Sensitive Content Intelligence (ASCI) This role focuses on building AI safety systems for Alexa, ensuring LLMs provide safe and trustworthy responses. It involves pioneering solutions in Responsible AI, designing automated testing systems, creating intelligent evaluation systems, building models that understand human values, and crafting AI agents for real-time detection and fixing of production issues. The role emphasizes frontier research with immediate real-world impact, aiming to set industry standards for responsible AI. | Eval GatePost-train | 9 |
| Applied Science Manager, Sponsored Products and Brands Manager for a Continuous Model Evaluation and Learning workstream within Amazon Ads' Sponsored Products and Brands team. The role involves leading a team of applied scientists and engineers to build and ship an evaluation and remediation framework for an agentic brand-intelligence system. This includes designing evaluation metrics, developing optimization engines for prompts and synthetic data, and ensuring offline-to-online consistency for quality improvements. The goal is to enable autonomous detect-diagnose-remediate loops to scale quality across brand skills. |
| Eval GateAgent |
| 8 |
| Data Scientist, AWS Quick Data The Data Scientist will focus on developing evaluation and benchmarking datasets for generative AI capabilities within the Amazon Quick Suite enterprise AI platform. This includes leveraging LLMs for synthetic data generation, creating ground truth datasets, leading human annotation initiatives, and contributing to Responsible AI efforts to ensure enterprise-readiness, safety, and effectiveness of AI at scale. | Eval GateData | 8 |
| Data Scientist, AWS Quick Data The Data Scientist II will focus on developing evaluation and benchmarking datasets for enterprise AI features, specifically for Amazon Quick Suite. This involves leveraging Generative AI techniques, LLMs for synthetic data generation, and LLM-as-a-judge settings to assess model performance, ensure data quality, and contribute to Responsible AI initiatives. The role also includes building scalable data pipelines and tools for continuous evaluation. | Eval GateData | 8 |
| Data Scientist, AWS Quick Data The Data Scientist will focus on developing evaluation and benchmarking datasets for generative AI capabilities within the Amazon Quick Suite enterprise AI platform. This includes leveraging LLMs for synthetic data generation, creating ground truth datasets, leading human annotation initiatives, and contributing to Responsible AI efforts to ensure enterprise-readiness, safety, and effectiveness of AI at scale. | Eval GateData | 8 |
| AI Principal Product Manager-Technical, Alexa Responsible AI The AI Principal PMT for Alexa Responsible AI will define the standard for how Alexa earns and keeps customer trust. This role owns the product discipline of Responsible AI, defining customer experiences for safety guardrails, trust signals, and evaluation frameworks. The PMT will set product vision and strategy, lead cross-functional alignment across Applied Science, Engineering, Legal, Policy, and UX, and ensure the full responsible product experience including safety, privacy, and security. The role requires technical depth in LLMs and AI safety, understanding how models fail and writing requirements for safety model development and evaluation system design. The PMT will also mentor other PMs and influence Responsible AI scaling across Alexa. | Eval GatePost-train | 8 |
| Manager, Program Management, Alexa Sensitive Content Intelligence (ASCI) Manager, Program Management for Alexa Sensitive Content Intelligence (ASCI) team, focusing on shaping how Alexa protects customers from harmful content using generative AI and responsible AI guardrails. The role involves strategic leadership, cross-functional program delivery, and team building, with a strong emphasis on data and LLM fluency, defining and executing roadmaps for responsible AI, and ensuring program execution through metrics and mechanisms. | Eval GateAgent | 7 |
| Applied Scientist, AWS Automated Reasoning Applied Scientist role focused on automated reasoning, privacy, and sovereignty within AWS. The role involves solving complex problems, designing and implementing solutions, and providing cross-organizational technical influence. Requires a PhD or Master's with significant applied research experience in areas like SAT, SMT, theorem proving, symbolic simulation, program analysis, or type systems. Experience with specific programming languages like O'Caml, Dafny, Haskell, Lean, or Rust is preferred. | Eval Gate | 7 |
| Sr. Software Development Engineer, Automated Reasoning Group Senior Software Development Engineer role focused on applying Automated Reasoning to verify Generative AI outputs, specifically addressing hallucinations within AWS services. The role involves designing and building new services and capabilities at scale, contributing to the evolution of the Automated Reasoning Checks (ARc) service, and making automated reasoning more accessible within AWS. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| Senior Applied Scientist, Fauna Senior Applied Scientist role focused on developing evaluation frameworks and data collection protocols for robotic capabilities. The role involves designing how to measure, stress-test, and improve robot behavior, building infrastructure for teleoperation, evaluation, and learning, and analyzing results to identify performance gaps. It requires expertise in robotics, ML, and human-in-the-loop systems, with a focus on turning capability goals into measurable evaluation systems. | Eval GateAgent | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation This role focuses on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. The primary responsibilities involve benchmarking AI models, evaluating audits performed by a core auditing team, improving audit consistency, and enforcing quality standards. The goal is to scale AI model evaluation coverage and ensure high-quality outcomes for sellers. | Eval Gate | 7 |
| AI Benchmarking Specialist, SP Support - German, International Seller Growth This role focuses on evaluating AI systems, specifically LLMs, by designing and executing benchmarking and audit activities. The core responsibilities include assessing model quality, compliance, robustness, and fairness, as well as handling annotations for training and measuring AI models. The role also involves preparing audit reports and ensuring data quality. | Eval GateData | 7 |
| AI Benchmarking Lead, Performance Benchmarking Evaluation The AI Benchmarking Lead will focus on ensuring the quality and reliability of AI model evaluations for Amazon's Seller Assistant copilot. This role involves benchmarking AI models, evaluating audit processes, improving audit consistency, and enforcing quality standards to support the scaling of the product to a wider seller base. | Eval Gate | 7 |
| Software Development Manager, Agentic AI - AgentCore This role is for a Software Development Manager on the Agentic AI organization's Evaluations & Optimization team at AWS. The manager will lead a team of engineers to build systems for assessing the quality, performance, and reliability of GenAI and agentic systems, as well as optimization solutions. The work involves deep learning, distributed systems, and evaluation science, focusing on building infrastructure and tooling for evaluation workflows. | Eval GateAgent | 7 |
| Applied Scientist, Artificial General Intelligence The Applied Scientist will develop and maintain LLM-as-a-Judge systems and auditing frameworks to ensure the quality of data used for training and evaluating Amazon Nova models, impacting LLM products and services. | Eval GatePost-train | 7 |
| Applied Science Manager, Artificial General Intelligence , Quality Automation Applied Science Manager for AGI team focusing on quality automation, auditing, and evaluation of LLMs and multimodal systems. Leads a team of scientists to develop quality strategies, auditing frameworks, and research new methodologies to ensure data integrity and model performance. Manages team development, cross-functional communication, and drives research into data impact and utility measurement for AI models. | Eval GatePost-train | 7 |
| AI Benchmarking Specialist, SP Support - Italian, International Seller Growth This role focuses on evaluating AI systems, specifically LLMs, by designing and executing benchmarking and audit activities. It involves assessing model quality, compliance, robustness, and fairness, with a strong emphasis on handling annotations for training, measuring, and improving AI models. The role also includes preparing audit reports and ensuring data quality based on annotation guidelines. | Eval GateData | 6 |
| AI Benchmarking Specialist, SP Support - Spanish, International Seller Growth This role focuses on evaluating AI systems, specifically LLMs, by designing and executing benchmarking and audit activities. It involves assessing model quality, compliance, robustness, and fairness, as well as handling annotations for training and measuring AI models. The role also includes preparing audit reports and ensuring data quality. | Eval GateData | 5 |