What you'd actually do

Design and implement evaluation frameworks to measure robot capabilities across structured tasks, edge cases, and real-world scenarios

Develop task definitions, success criteria, and benchmarking methodologies that enable consistent and reproducible evaluation of policies

Create and refine data collection protocols that generate high-quality, task-relevant datasets aligned with model development needs

Build and iterate on teleoperation workflows and operator interfaces to support efficient, reliable, and scalable data collection

Analyze evaluation results and collected data to identify performance gaps, failure modes, and opportunities for targeted data collection

Skills

Required

Python, Java, C++
building machine learning models or developing algorithms for business application
leading technical initiatives and key deliverables
deep learning and model development
robotics systems, control, or embodied AI
designing evaluation methodologies, benchmarks, or experimental frameworks for large-scale ML models or robotic systems
teleoperation systems, simulation environments, or human-in-the-loop data collection

Nice to have

managing and deploying ML products
patents or publications at top-tier peer-reviewed conferences or journals
leading research initiatives in robotics or foundation models

What the JD emphasized

evaluation frameworks

data collection protocols

robot behavior

human-in-the-loop systems

teleoperation

evaluation

learning

scalable and reliable data collection

measurable and actionable evaluation systems

evaluation methodologies

benchmarking methodologies

data collection

teleoperation workflows

operator interfaces

evaluation results

performance gaps

failure modes

targeted data collection

evaluation tooling

data pipelines

human-in-the-loop learning

evaluation methodologies

technical initiatives

key deliverables

deep learning

model development

robotics systems

control

embodied AI

evaluation methodologies

benchmarks

experimental frameworks

large-scale ML models

robotic systems

teleoperation systems

simulation environments

human-in-the-loop data collection

research initiatives

robotics

foundation models

We are seeking an Applied Scientist to lead the development of evaluation frameworks and data collection protocols for robotic capabilities. In this role, you will focus on designing how we measure, stress-test, and improve robot behavior across a wide range of real-world tasks. Your work will play a critical role in shaping how policies are validated and how high-quality datasets are generated to accelerate system performance. You will operate at the intersection of robotics, machine learning, and human-in-the-loop systems, building the infrastructure and methodologies that connect teleoperation, evaluation, and learning. This includes developing evaluation policies, defining task structures, and contributing to operator-facing interfaces that enable scalable and reliable data collection. The ideal candidate is highly experimental, systems-oriented, and comfortable working across software, robotics, and data pipelines, with a strong focus on turning ambiguous capability goals into measurable and actionable evaluation systems.

Key job responsibilities

Design and implement evaluation frameworks to measure robot capabilities across structured tasks, edge cases, and real-world scenarios
Develop task definitions, success criteria, and benchmarking methodologies that enable consistent and reproducible evaluation of policies
Create and refine data collection protocols that generate high-quality, task-relevant datasets aligned with model development needs
Build and iterate on teleoperation workflows and operator interfaces to support efficient, reliable, and scalable data collection
Analyze evaluation results and collected data to identify performance gaps, failure modes, and opportunities for targeted data collection
Collaborate with engineering teams to integrate evaluation tooling, logging systems, and data pipelines into the broader robotics stack
Stay current with advances in robotics, evaluation methodologies, and human-in-the-loop learning to continuously improve internal approaches
Lead technical projects from conception through production deployment
Mentor junior scientists and engineers

Basic Qualifications

PhD, or Master's degree and 6+ years of applied research experience
3+ years of industry or academic research experience
Experience with any programming language such as Python, Java, C++
5+ years of building machine learning models or developing algorithms for business application experience
Experience leading technical initiatives and key deliverables
- Experience in patents or publication at top-tier conferences
- Demonstrated expertise in deep learning and model development
- Strong experience with robotics systems, control, or embodied AI
- Experience designing evaluation methodologies, benchmarks, or experimental frameworks for large-scale ML models or robotic systems
- Familiarity with teleoperation systems, simulation environments, or human-in-the-loop data collection

Preferred Qualifications

Experience managing and deploying ML products
- Experience in patents or publications at top-tier peer-reviewed conferences or journals
- Experience leading research initiatives in robotics or foundation models

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, NY, New York - 183,800.00 - 248,700.00 USD annually

Key job responsibilities

Design and implement evaluation frameworks to measure robot capabilities across structured tasks, edge cases, and real-world scenarios
Develop task definitions, success criteria, and benchmarking methodologies that enable consistent and reproducible evaluation of policies
Create and refine data collection protocols that generate high-quality, task-relevant datasets aligned with model development needs
Build and iterate on teleoperation workflows and operator interfaces to support efficient, reliable, and scalable data collection
Analyze evaluation results and collected data to identify performance gaps, failure modes, and opportunities for targeted data collection
Collaborate with engineering teams to integrate evaluation tooling, logging systems, and data pipelines into the broader robotics stack
Stay current with advances in robotics, evaluation methodologies, and human-in-the-loop learning to continuously improve internal approaches
Lead technical projects from conception through production deployment
Mentor junior scientists and engineers

Basic Qualifications

PhD, or Master's degree and 6+ years of applied research experience
3+ years of industry or academic research experience
Experience with any programming language such as Python, Java, C++
5+ years of building machine learning models or developing algorithms for business application experience
Experience leading technical initiatives and key deliverables
- Experience in patents or publication at top-tier conferences
- Demonstrated expertise in deep learning and model development
- Strong experience with robotics systems, control, or embodied AI
- Experience designing evaluation methodologies, benchmarks, or experimental frameworks for large-scale ML models or robotic systems
- Familiarity with teleoperation systems, simulation environments, or human-in-the-loop data collection

Preferred Qualifications

Experience managing and deploying ML products
- Experience in patents or publications at top-tier peer-reviewed conferences or journals
- Experience leading research initiatives in robotics or foundation models

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

USA, NY, New York - 183,800.00 - 248,700.00 USD annually

Senior Applied Scientist, Fauna

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Basic Qualifications

Preferred Qualifications

Basic Qualifications

Preferred Qualifications