Software Engineer, ML Data Infrastructure

Nuro Nuro · Robotics · CA · Offboard Infrastructure

Nuro is seeking a Software Engineer, ML Data Infrastructure to build and maintain scalable data pipelines for autonomous driving systems. This role involves designing systems for ingesting, processing, storing, and evaluating large volumes of data, including developing data annotation tools and applying ML techniques for data discovery and scaling labels. The focus is on creating robust data infrastructure to support the advancement of ML models for self-driving technology.

What you'd actually do

  1. Design and develop unified, introspectable, large-scale batch and streaming data pipelines that can ingest and process data across a wide range of use cases relevant to evaluation.
  2. Create and implement a storage system capable of accommodating both the large volume and diverse range of evaluation and performance metrics.
  3. Construct intuitive dashboards and reports to present evaluation results, facilitating straightforward comparisons that highlight both improvements and regressions of the ML components and the overall system.
  4. Develop and maintain continuous testing and monitoring systems to guarantee the integrity and resilience of our data and associated data pipelines.
  5. Develop data mining tools with applied ML techniques to support data discovery needs from Autonomy including Perception, Behavior, and Mapping

Skills

Required

  • Python
  • large-scale data
  • scalable & reliable systems/data pipelines
  • complex systems design
  • deep dive into implementation
  • technical standards and best practices

Nice to have

  • C++
  • GCP
  • GCS
  • BigQuery
  • PostgreSQL
  • data engineering
  • batch and streaming data processing
  • warehousing
  • analytics solutions
  • large-scale distributed data systems
  • system & framework design
  • data workflow orchestration platforms

What the JD emphasized

  • large-scale batch and streaming data pipelines
  • data annotation tools
  • applied ML techniques

Other signals

  • ML data infrastructure
  • training and evaluation data
  • autonomous driving systems
  • large-scale batch and streaming data pipelines
  • data annotation tools
  • applied ML techniques