Senior Autonomy Engineer - Data Curation

Skydio · Defense · San Mateo, CA +1 · R&D

Senior Engineer on the Autonomy Data Curation team responsible for building a data flywheel by collecting data from drone fleets, transforming it into high-quality, model-ready datasets for Autonomy teams, and building tooling for data discovery and slicing.

What you'd actually do

  1. Build and operate pipelines that transform raw autonomy logs & media into curated datasets with strong observability and clear ownership to make curated data more broadly reusable.
  2. Build tooling that makes data discovery and slicing fast and self-serve for Autonomy teams. For example: media search tooling and hard mining loops with infra for auto-routing data to annotation.
  3. Improve dataset quality and repeatability: versioning, provenance, and automated checks.
  4. Apply privacy and security requirements in throughout our processes (access controls, retention, redaction/anonymization).
  5. Build with a data-driven and impact-forward mindset with dashboards highlighting cost, dataset balance, and audit details.

Skills

Required

  • 5+ years of professional software engineering experience
  • Strong proficiency in programming (Python/C++)
  • Hands-on experience building data pipelines for large-scale datasets (ETL/ELT, streaming or batch, orchestration)
  • Experience with data modeling, schema evolution, and dataset/version management
  • Solid understanding of reliability engineering: monitoring, incident response, backfills, and operational rigor
  • Ability to work across ambiguous interfaces and drive decisions

Nice to have

  • Experience with autonomy/robotics data: flight logs, self-driving car data, sensor fusion traces, video, geospatial metadata.
  • Experience with labeling workflows, annotation tooling, and labeling QA at scale.
  • Familiarity with privacy concepts (PII handling, redaction, access control, audit logs).
  • Experience with vector/semantic search over media or telemetry.
  • Experience building hard-mining evaluation loops for ML models.

What the JD emphasized

  • significant ownership of production systems
  • large-scale datasets
  • data modeling
  • schema evolution
  • dataset/version management
  • reliability engineering
  • monitoring
  • incident response
  • backfills
  • operational rigor
  • ambiguous interfaces
  • autonomy/robotics data
  • flight logs
  • self-driving car data
  • sensor fusion traces
  • video
  • geospatial metadata
  • labeling workflows
  • annotation tooling
  • labeling QA at scale
  • privacy concepts
  • PII handling
  • redaction
  • access control
  • audit logs
  • vector/semantic search
  • hard-mining evaluation loops
  • generative AI coding
  • agentic workflows

Other signals

  • data flywheel
  • model-ready datasets
  • training and model development
  • pipelines
  • large-scale datasets