Data Platform Engineer, Fauna

Amazon Amazon · Big Tech · NY +1 · Business Intelligence

This role builds the foundational data platform for robotics and ML development, focusing on data pipelines, storage, and transformation to support ML training and fleet monitoring.

What you'd actually do

  1. Design and build scalable data pipelines for ingesting and processing robotics data (sensor streams, video, telemetry, logs)
  2. Develop and maintain data storage solutions optimized for diverse data types and access patterns
  3. Create tools and APIs for researchers and engineers to efficiently query and analyze large datasets
  4. Build real-time data processing systems for monitoring robot fleet performance
  5. Build and maintain data transformation pipelines that prepare robotics data for ML training

Skills

Required

  • Bachelor's degree or above in computer science, computer engineering, or related field, or experience in data science, machine learning or data mining
  • 3+ years of data engineering experience
  • Experience in scripting for automation (e.g. Python)
  • advanced SQL skills
  • Experience in Kafka, or experience in Hive/Spark/Hbase/Yarn
  • experience in software development
  • Experience with cloud computing technologies
  • Knowledge of distributed systems as it pertains to data storage and computing
  • Proficiency with data storage technologies (e.g., PostgreSQL, object storage)

Nice to have

  • Experience working with robotics or IoT data (time-series, video, point clouds)
  • Knowledge of streaming architectures and real-time analytics
  • Familiarity with ML techniques and how data preparation impacts model training
  • Experience with data cataloging, metadata management, and data discovery tools

What the JD emphasized

  • data engineering experience
  • scripting for automation (e.g. Python)
  • advanced SQL skills
  • Kafka, or experience in Hive/Spark/Hbase/Yarn
  • software development
  • cloud computing technologies
  • distributed systems as it pertains to data storage and computing
  • data storage technologies (e.g., PostgreSQL, object storage)
  • robotics or IoT data (time-series, video, point clouds)
  • streaming architectures and real-time analytics
  • ML techniques and how data preparation impacts model training

Other signals

  • build foundational data systems
  • powering robotics and machine learning development
  • design and implement infrastructure for collecting, storing, processing, and transforming data
  • data is accessible, well-structured, and ready for training
  • enable research teams to iterate faster
  • prepare robotics data for ML training