Senior Autonomy Engineer, Data Curation

Skydio Skydio · Defense · San Mateo, CA +1 · R&D

Senior Engineer on the Autonomy Data Curation team responsible for building a data flywheel to transform raw autonomy logs and media into high-quality, model-ready datasets for Autonomy teams. This involves building and operating data pipelines, developing data discovery and slicing tooling, improving dataset quality, and applying privacy/security requirements.

What you'd actually do

  1. Build and operate pipelines that transform raw autonomy logs & media into curated datasets with strong observability and clear ownership to make curated data more broadly reusable.
  2. Build tooling that makes data discovery and slicing fast and self-serve for Autonomy teams. For example: media search tooling and hard mining loops with infra for auto-routing data to annotation.
  3. Improve dataset quality and repeatability: versioning, provenance, and automated checks.
  4. Apply privacy and security requirements in throughout our processes (access controls, retention, redaction/anonymization).
  5. Build with a data-driven and impact-forward mindset with dashboards highlighting cost, dataset balance, and audit details.

Skills

Required

  • Python
  • C++
  • ETL/ELT
  • streaming or batch processing
  • orchestration
  • data modeling
  • schema evolution
  • dataset version management
  • reliability engineering
  • monitoring
  • incident response

Nice to have

  • autonomy/robotics data
  • flight logs
  • self-driving car data
  • sensor fusion traces
  • video
  • geospatial metadata
  • labeling workflows
  • annotation tooling
  • labeling QA at scale
  • privacy concepts
  • PII handling
  • redaction
  • access control
  • audit logs
  • vector/semantic search over media or telemetry
  • hard-mining evaluation loops for ML models

What the JD emphasized

  • 5+ years of professional software engineering experience (or equivalent), with significant ownership of production systems.
  • Strong proficiency in programming, demonstrable in at least one of of our most frequently used languages (Python/C++).
  • Hands-on experience building data pipelines for large-scale datasets (ETL/ELT, streaming or batch, orchestration).
  • Experience with data modeling, schema evolution, and dataset/version management.
  • Solid understanding of reliability engineering: monitoring, incident response, backfills, and operational rigor.

Other signals

  • data flywheel
  • model-ready datasets
  • training and model development
  • large-scale datasets
  • dataset quality and repeatability