What you'd actually do

Design and implement evaluation frameworks to measure robot capabilities across structured tasks, edge cases, and real-world scenarios

Develop task definitions, success criteria, and benchmarking methodologies that enable consistent and reproducible evaluation of policies

Create and refine data collection protocols that generate high-quality, task-relevant datasets aligned with model development needs

Build and iterate on teleoperation workflows and operator interfaces to support efficient, reliable, and scalable data collection

Analyze evaluation results and collected data to identify performance gaps, failure modes, and opportunities for targeted data collection

Skills

Required

PhD, or Master's degree and 4+ years of CS, CE, ML or related field experience
Experience in patents or publications at top-tier peer-reviewed conferences or journals
Experience programming in Java, C++, Python or related language
Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing

Nice to have

Experience using Unix/Linux
Experience in professional software development

What the JD emphasized

building the infrastructure and methodologies that connect teleoperation, evaluation, and learning

developing evaluation policies, defining task structures, and contributing to operator-facing interfaces

designing how we measure, stress-test, and improve robot behavior across a wide range of real-world tasks

building models for business application experience

patents or publications at top-tier peer-reviewed conferences or journals

We are seeking an Applied Scientist to lead the development of evaluation frameworks and data collection protocols for robotic capabilities. In this role, you will focus on designing how we measure, stress-test, and improve robot behavior across a wide range of real-world tasks. Your work will play a critical role in shaping how policies are validated and how high-quality datasets are generated to accelerate system performance. You will operate at the intersection of robotics, machine learning, and human-in-the-loop systems, building the infrastructure and methodologies that connect teleoperation, evaluation, and learning. This includes developing evaluation policies, defining task structures, and contributing to operator-facing interfaces that enable scalable and reliable data collection. The ideal candidate is highly experimental, systems-oriented, and comfortable working across software, robotics, and data pipelines, with a strong focus on turning ambiguous capability goals into measurable and actionable evaluation systems.

Key job responsibilities

Design and implement evaluation frameworks to measure robot capabilities across structured tasks, edge cases, and real-world scenarios
Develop task definitions, success criteria, and benchmarking methodologies that enable consistent and reproducible evaluation of policies
Create and refine data collection protocols that generate high-quality, task-relevant datasets aligned with model development needs
Build and iterate on teleoperation workflows and operator interfaces to support efficient, reliable, and scalable data collection
Analyze evaluation results and collected data to identify performance gaps, failure modes, and opportunities for targeted data collection
Collaborate with engineering teams to integrate evaluation tooling, logging systems, and data pipelines into the broader robotics stack
Stay current with advances in robotics, evaluation methodologies, and human-in-the-loop learning to continuously improve internal approaches
Lead technical projects from conception through production deployment
Mentor junior scientists and engineers

Basic Qualifications

3+ years of building models for business application experience
PhD, or Master's degree and 4+ years of CS, CE, ML or related field experience
Experience in patents or publications at top-tier peer-reviewed conferences or journals
Experience programming in Java, C++, Python or related language
Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing

Preferred Qualifications

Experience using Unix/Linux
Experience in professional software development

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, NY, New York - 172,400.00 - 223,400.00 USD annually

Key job responsibilities

Design and implement evaluation frameworks to measure robot capabilities across structured tasks, edge cases, and real-world scenarios
Develop task definitions, success criteria, and benchmarking methodologies that enable consistent and reproducible evaluation of policies
Create and refine data collection protocols that generate high-quality, task-relevant datasets aligned with model development needs
Build and iterate on teleoperation workflows and operator interfaces to support efficient, reliable, and scalable data collection
Analyze evaluation results and collected data to identify performance gaps, failure modes, and opportunities for targeted data collection
Collaborate with engineering teams to integrate evaluation tooling, logging systems, and data pipelines into the broader robotics stack
Stay current with advances in robotics, evaluation methodologies, and human-in-the-loop learning to continuously improve internal approaches
Lead technical projects from conception through production deployment
Mentor junior scientists and engineers

Basic Qualifications

3+ years of building models for business application experience
PhD, or Master's degree and 4+ years of CS, CE, ML or related field experience
Experience in patents or publications at top-tier peer-reviewed conferences or journals
Experience programming in Java, C++, Python or related language
Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing

Preferred Qualifications

Experience using Unix/Linux
Experience in professional software development

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

USA, NY, New York - 172,400.00 - 223,400.00 USD annually

Applied Scientist, Fauna

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Basic Qualifications

Preferred Qualifications

Basic Qualifications

Preferred Qualifications