Software Engineer, Performance Tooling and Infrastructure

Nuro Nuro · Robotics · CA · Fleet Infrastructure

Nuro is seeking a Software Engineer to own the performance simulation platform infrastructure for their self-driving technology. This role involves developing and maintaining systems for benchmarking autonomy code changes, ensuring real-time performance on actual robot compute hardware. Responsibilities include building benchmarking infrastructure, ensuring platform reliability and observability, designing data pipelines for performance metrics, conducting statistical analysis, guiding bare-metal OS configuration, and driving platform strategy. The role requires strong software engineering skills in Python and C++, data engineering experience, deep Linux systems knowledge, and technical leadership capabilities. While the role supports AI development, it focuses on the infrastructure and tooling rather than core AI/ML model development.

What you'd actually do

  1. Develop and maintain the job orchestration layer that schedules, executes, and validates autonomy performance benchmarks across a fleet of physical bench-top systems — integrated into CI/CD pipelines as merge-blocking and release-blocking quality gates.
  2. Build monitoring, alerting, and self-healing automation for the bench fleet. Proactively identify systemic risks — capacity bottlenecks, hardware degradation patterns, infrastructure single points of failure — before they become outages. Track utilization, failure rates, and capacity trends to ensure the platform scales ahead of organizational demand.
  3. Design and build end-to-end data pipelines that capture fine-grained performance metrics (CPU/GPU utilization, memory bandwidth, E2E latency, scheduling jitter) from bench-top runs, process them at scale, and surface actionable insights through dashboards and automated regression detection.
  4. Work with Data Science to develop rigorous experimentation methodology for performance results from non-deterministic autonomy workloads — including variance analysis, significance testing, and regression detection.
  5. Guide the SRE team through the OS and system-level configuration of bench hardware — including Linux kernel tuning, boot infrastructure, networking, and hardware bring-up — ensuring the platform faithfully reproduces production robot compute behavior.

Skills

Required

  • Python
  • C++
  • Linux systems
  • data pipelines
  • SQL
  • networking
  • storage
  • compute
  • technical leadership
  • roadmap setting
  • stakeholder alignment

Nice to have

  • Kubernetes
  • GCP
  • BigQuery
  • Grafana
  • NVIDIA Thor platform
  • systemd units
  • kernel tuning
  • boot infrastructure
  • hardware bring-up
  • statistical analysis
  • experimentation methodology
  • variance analysis
  • significance testing
  • agentic tooling

What the JD emphasized

  • must be validated for real-time performance on actual robot compute hardware
  • merge-blocking and release-blocking quality gates
  • technical DRI for the platform
  • setting the roadmap
  • making architectural calls
  • representing the platform's needs to the leadership team
  • ensuring the system scales through multiple hardware generations
  • proactively identify systemic risks
  • before they become outages
  • scales ahead of organizational demand
  • surface actionable insights
  • automated regression detection
  • rigorous experimentation methodology
  • faithfully reproduces production robot compute behavior
  • Own the planning lifecycle for the benchmarking fleet across hardware generations
  • Partner with engineering and program leadership to negotiate hardware allocation
  • model utilization scenarios under real-world constraints
  • present data-backed trade-off recommendations
  • balancing testing coverage, user throughput, cost, and SLA commitments against finite physical resources
  • translate their performance analysis needs into robust, self-service infrastructure
  • 3+ years of industry software engineering experience
  • Strong proficiency in Python
  • working proficiency in C++
  • You write clean, testable, well-documented code and care about long-term maintainability
  • Experience building data pipelines, ingestion, transformation, storage, and visualization
  • Familiarity with SQL and analytical workflows
  • Deep comfort with Linux systems
  • you've configured kernels, debugged boot issues, written systemd units, or managed bare-metal infrastructure
  • You understand networking, storage, and compute at a level beyond "it just works."
  • Experience setting technical vision and roadmap for a project or platform
  • driving alignment across multiple stakeholders
  • You've independently identified the cross-functional partners needed to unblock and deliver
  • you've briefed senior engineering leadership on trade-offs and recommendations
  • AI-Native
  • treat AI as a core part of your engineering workflow
  • you use agentic tooling (e.g., Claude Code) across the development lifecycle
  • you understand the boundaries