Principal Engineer, Data & Compute

Wayve Wayve · Robotics · London, United Kingdom +1 · AI Platform

Principal Engineer, Data & Compute at Wayve, a company developing Embodied AI for autonomous driving. This role focuses on designing and evolving the foundational compute and storage systems for large-scale AI model development, including training and inference workloads across thousands of GPUs and petabyte-scale data federation. The position requires deep expertise in distributed systems, GPU infrastructure, and petabyte-scale data architecture, with a focus on enabling AI research and rapid model deployment in a hybrid/multi-cloud environment.

What you'd actually do

  1. Global Compute Strategy – Define and evolve the architecture for how Wayve allocates and orchestrates training and inference workloads across thousands of GPUs and multiple data centers, ensuring optimal throughput, resiliency, and cost efficiency.
  2. Petabyte-Scale Data Federation – Design systems that enable fast, reliable access to high-volume sensor and simulation data across geographies, ensuring the right data is always available for training, evaluation, and inference. Furthermore, preparing Wayve for being an exabyte-scale company.
  3. Cross-Region GPU Job Execution – Build the foundations that enable large-scale AI workloads to run seamlessly across hybrid and multi-cloud environments.
  4. Cloud Infrastructure Advisory – Act as a trusted partner to leadership in aligning compute investments and architecture with company strategy, growth plans, and performance goals.
  5. Technical Leadership & Mentorship – Uplift the broader engineering org through architectural coaching, technical deep dives, and by cultivating a culture of operational and engineering excellence.

Skills

Required

  • designing and building large-scale distributed systems
  • GPU-based cloud infrastructure
  • enabling large-scale AI training, inference, or computer vision workloads in GPU clusters
  • petabyte-scale data architecture
  • storage federation
  • high-throughput access
  • data locality for AI workloads
  • technical leadership
  • defining and communicating architectural strategy
  • balancing long-term vision with delivery needs
  • mentoring engineers
  • influencing technical direction across teams
  • Computer Science, Electrical Engineering, or related field, or equivalent industry experience

Nice to have

  • multi-cloud orchestration
  • latency- or cost-sensitive training and inference pipelines
  • Ray
  • Kubernetes
  • Airflow
  • Flyte
  • AI/ML job scheduling
  • model lifecycle management
  • infrastructure-as-code practices
  • supporting safety-critical or real-time inference use cases
  • robotics
  • autonomous vehicles
  • aerospace
  • building infrastructure-as-a-product

What the JD emphasized

  • 10+ years designing and building large-scale distributed systems, with at least 4 years focused on GPU-based cloud infrastructure
  • Proven experience enabling large-scale AI training, inference, or computer vision workloads in GPU clusters
  • Deep understanding of petabyte-scale data architecture, including storage federation, high-throughput access, and data locality for AI workloads

Other signals

  • design and guide the evolution of the foundational compute and storage systems that fuel our model development lifecycle
  • Define and evolve the architecture for how Wayve allocates and orchestrates training and inference workloads across thousands of GPUs and multiple data centers
  • Design systems that enable fast, reliable access to high-volume sensor and simulation data across geographies
  • Build the foundations that enable large-scale AI workloads to run seamlessly across hybrid and multi-cloud environments