Senior Applied Scientist, Fauna

Amazon Amazon · Big Tech · NY +1 · Research Science

Senior Applied Scientist role focused on developing evaluation frameworks and data collection protocols for robotic capabilities. The role involves designing how to measure, stress-test, and improve robot behavior, building infrastructure for teleoperation, evaluation, and learning, and analyzing results to identify performance gaps. It requires expertise in robotics, ML, and human-in-the-loop systems, with a focus on turning capability goals into measurable evaluation systems.

What you'd actually do

  1. Design and implement evaluation frameworks to measure robot capabilities across structured tasks, edge cases, and real-world scenarios
  2. Develop task definitions, success criteria, and benchmarking methodologies that enable consistent and reproducible evaluation of policies
  3. Create and refine data collection protocols that generate high-quality, task-relevant datasets aligned with model development needs
  4. Build and iterate on teleoperation workflows and operator interfaces to support efficient, reliable, and scalable data collection
  5. Analyze evaluation results and collected data to identify performance gaps, failure modes, and opportunities for targeted data collection

Skills

Required

  • Python, Java, C++
  • building machine learning models or developing algorithms for business application
  • leading technical initiatives and key deliverables
  • deep learning and model development
  • robotics systems, control, or embodied AI
  • designing evaluation methodologies, benchmarks, or experimental frameworks for large-scale ML models or robotic systems
  • teleoperation systems, simulation environments, or human-in-the-loop data collection

Nice to have

  • managing and deploying ML products
  • patents or publications at top-tier peer-reviewed conferences or journals
  • leading research initiatives in robotics or foundation models

What the JD emphasized

  • evaluation frameworks
  • data collection protocols
  • robot behavior
  • human-in-the-loop systems
  • teleoperation
  • evaluation
  • learning
  • scalable and reliable data collection
  • measurable and actionable evaluation systems
  • evaluation methodologies
  • benchmarking methodologies
  • data collection
  • teleoperation workflows
  • operator interfaces
  • evaluation results
  • performance gaps
  • failure modes
  • targeted data collection
  • evaluation tooling
  • data pipelines
  • human-in-the-loop learning
  • evaluation methodologies
  • technical initiatives
  • key deliverables
  • deep learning
  • model development
  • robotics systems
  • control
  • embodied AI
  • evaluation methodologies
  • benchmarks
  • experimental frameworks
  • large-scale ML models
  • robotic systems
  • teleoperation systems
  • simulation environments
  • human-in-the-loop data collection
  • research initiatives
  • robotics
  • foundation models

Other signals

  • designing evaluation frameworks
  • measure, stress-test, and improve robot behavior
  • human-in-the-loop systems
  • teleoperation, evaluation, and learning
  • scalable and reliable data collection