Senior Software Engineer, Performance Tooling and Infrastructure

Nuro Nuro · Robotics · CA · Fleet Infrastructure

Nuro is seeking a Senior Software Engineer to own the infrastructure for their performance simulation platform. This platform validates autonomy code changes (ML models, map data, trajectories) on actual robot compute hardware before road deployment. The role involves developing and maintaining job orchestration, reliability, observability, data pipelines, and statistical analysis for benchmarking. It requires deep systems and infrastructure knowledge, Python/C++ proficiency, and experience with Linux systems and cloud-native infrastructure (Kubernetes, GCP). The engineer will set technical vision, manage the roadmap, and collaborate across teams to ensure the platform scales and gates release velocity for the entire autonomy stack.

What you'd actually do

  1. Develop and maintain the job orchestration layer that schedules, executes, and validates autonomy performance benchmarks across a fleet of physical bench-top systems — integrated into CI/CD pipelines as merge-blocking and release-blocking quality gates.
  2. Build monitoring, alerting, and self-healing automation for the bench fleet. Proactively identify systemic risks — capacity bottlenecks, hardware degradation patterns, infrastructure single points of failure — before they become outages. Track utilization, failure rates, and capacity trends to ensure the platform scales ahead of organizational demand.
  3. Design and build end-to-end data pipelines that capture fine-grained performance metrics (CPU/GPU utilization, memory bandwidth, E2E latency, scheduling jitter) from bench-top runs, process them at scale, and surface actionable insights through dashboards and automated regression detection.
  4. Work with Data Science to develop rigorous experimentation methodology for performance results from non-deterministic autonomy workloads — including variance analysis, significance testing, and regression detection.
  5. Guide the SRE team through the OS and system-level configuration of bench hardware — including Linux kernel tuning, boot infrastructure, networking, and hardware bring-up — ensuring the platform faithfully reproduces production robot compute behavior.

Skills

Required

  • Python
  • C++
  • Linux systems
  • networking
  • storage
  • compute
  • data pipelines
  • ingestion
  • transformation
  • storage
  • visualization
  • SQL
  • analytical workflows
  • job orchestration
  • CI/CD
  • performance metrics
  • statistical analysis
  • experimentation
  • technical leadership
  • roadmap setting
  • stakeholder alignment

Nice to have

  • Kubernetes
  • GCP
  • BigQuery
  • Grafana
  • NVIDIA Thor platform
  • agentic tooling (e.g., Claude Code)

What the JD emphasized

  • must be validated for real-time performance on actual robot compute hardware before it reaches the road
  • merge-blocking and release-blocking quality gates
  • technical DRI for the platform
  • setting the roadmap
  • making architectural calls
  • representing the platform's needs to the leadership team
  • ensuring the system scales through multiple hardware generations
  • Drive Platform & Allocation Strategy
  • negotiate hardware allocation
  • model utilization scenarios under real-world constraints
  • present data-backed trade-off recommendations
  • balancing testing coverage, user throughput, cost, and SLA commitments against finite physical resources
  • 5+ years of industry software engineering experience
  • Strong proficiency in Python
  • working proficiency in C++
  • Deep comfort with Linux systems
  • you've configured kernels, debugged boot issues, written systemd units, or managed bare-metal infrastructure
  • Experience setting technical vision and roadmap for a project or platform
  • driving alignment across multiple stakeholders
  • independently identified the cross-functional partners needed to unblock and deliver
  • briefed senior engineering leadership on trade-offs and recommendations