Applied Scientist, Fauna

Amazon Amazon · Big Tech · NY +1 · Research Science

This role focuses on developing evaluation frameworks and data collection protocols for robotic capabilities, bridging robotics, ML, and human-in-the-loop systems. The scientist will design evaluation methodologies, create data collection protocols, build teleoperation workflows, and analyze results to improve robot behavior and dataset generation.

What you'd actually do

  1. Design and implement evaluation frameworks to measure robot capabilities across structured tasks, edge cases, and real-world scenarios
  2. Develop task definitions, success criteria, and benchmarking methodologies that enable consistent and reproducible evaluation of policies
  3. Create and refine data collection protocols that generate high-quality, task-relevant datasets aligned with model development needs
  4. Build and iterate on teleoperation workflows and operator interfaces to support efficient, reliable, and scalable data collection
  5. Analyze evaluation results and collected data to identify performance gaps, failure modes, and opportunities for targeted data collection

Skills

Required

  • PhD, or Master's degree and 4+ years of CS, CE, ML or related field experience
  • Experience in patents or publications at top-tier peer-reviewed conferences or journals
  • Experience programming in Java, C++, Python or related language
  • Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing

Nice to have

  • Experience using Unix/Linux
  • Experience in professional software development

What the JD emphasized

  • building the infrastructure and methodologies that connect teleoperation, evaluation, and learning
  • developing evaluation policies, defining task structures, and contributing to operator-facing interfaces
  • designing how we measure, stress-test, and improve robot behavior across a wide range of real-world tasks
  • building models for business application experience
  • patents or publications at top-tier peer-reviewed conferences or journals

Other signals

  • designing how we measure, stress-test, and improve robot behavior
  • building the infrastructure and methodologies that connect teleoperation, evaluation, and learning
  • developing evaluation policies, defining task structures, and contributing to operator-facing interfaces