Head of Global Compute Supply & Platform Strategy

Luma AI · AI Frontier · SF Bay Area, CA · Research

Head of Global Compute Supply & Platform Strategy for a robotics foundation model company. This role is responsible for the end-to-end global compute footprint, including capacity strategy, capital allocation, and systems architecture. The goal is to design a scaling roadmap to ensure research and robotics teams have the necessary compute resources to ship frontier world models. The role involves leading infrastructure, distributed systems, and datacenter operations teams, maximizing fleet utilization, managing large capital budgets, and serving as the primary interface with compute vendors.

What you'd actually do

Architect Multi-Year Compute Strategy: Lead capacity planning, global vendor and cloud partnerships, on-prem vs. cloud mix, and accelerator supply chain roadmaps (H/B-series GPUs, custom silicon evaluation).
Direct the Platform Org: Provide strategic leadership to our infrastructure, distributed systems, and datacenter operations teams—scaling the organization to support next-generation compute demands.
Maximize Fleet Utilization: Oversee the architectural efficiency of our cluster configurations to deliver >50% Model Flops Utilization (MFU) on flagship training runs.
Command a Megawatt Budget: Negotiate, secure, and operate our largest-scale capital deployments for compute infrastructure, partnering directly with Finance to optimize unit economics and risk management.
Unify Global Capacity: Champion the platform strategy that enables world-model training, heavy simulation rollouts, and real-time on-robot inference to seamlessly share a single, elastic fleet.

Skills

Required

Compute Strategy
Capacity Planning
Vendor Partnerships
Cloud Partnerships
On-Prem vs. Cloud Mix
Accelerator Supply Chain
Custom Silicon Evaluation
Infrastructure Leadership
Distributed Systems
Datacenter Operations
Fleet Utilization Optimization
Cluster Configurations
Capital Deployment
Budget Management
Unit Economics
Risk Management
World-Model Training
Simulation Rollouts
On-Robot Inference
Elastic Fleet Management
High-Performance Cluster Topology
High-Speed Interconnects (InfiniBand/RoCE)
Large-Scale Data Systems
Distributed Training Architectures
10k+ Accelerator Environments
High-Performance Production Settings

Nice to have

Scale Credentials (>100B-parameter or >100k-GPU-day scale)
Robotics/Autonomy Context
Edge-to-Cloud Inference
Real-Time Autonomous Systems

What the JD emphasized

10+ years of engineering leadership experience in large-scale distributed systems, infrastructure, or technical supply chain, with a proven track record of leading compute platform strategy at a frontier AI lab, hyperscaler, or major autonomy program.
Deep technical & commercial fluency in high-performance cluster topology, high-speed interconnects (InfiniBand/RoCE), large-scale data systems, and the economics of distributed training architectures.
Direct operational oversight of 10k+ accelerator environments in high-performance production settings.

Other signals

owns Luma’s global compute footprint end-to-end
design our scaling roadmap from the silicon up
turning capital into capability
Architect Multi-Year Compute Strategy
Direct the Platform Org
Maximize Fleet Utilization
Command a Megawatt Budget
Unify Global Capacity
Act as Principal Executive Interface

Read full job description

The Role

Compute is the ultimate physical and financial prerequisite for the robotics foundation models we are building. This role owns Luma’s global compute footprint end-to-end—bridging macro capacity strategy, multi-million dollar capital allocation, and top-tier systems architecture. You will design our scaling roadmap from the silicon up, ensuring our research and robotics teams have the uninterrupted runway they need to ship frontier world models. As a member of the executive team, you will be the single person responsible for turning capital into capability.

What You'll Do

Architect Multi-Year Compute Strategy: Lead capacity planning, global vendor and cloud partnerships, on-prem vs. cloud mix, and accelerator supply chain roadmaps (H/B-series GPUs, custom silicon evaluation).
Direct the Platform Org: Provide strategic leadership to our infrastructure, distributed systems, and datacenter operations teams—scaling the organization to support next-generation compute demands.
Maximize Fleet Utilization: Oversee the architectural efficiency of our cluster configurations to deliver >50% Model Flops Utilization (MFU) on flagship training runs.
Command a Megawatt Budget: Negotiate, secure, and operate our largest-scale capital deployments for compute infrastructure, partnering directly with Finance to optimize unit economics and risk management.
Unify Global Capacity: Champion the platform strategy that enables world-model training, heavy simulation rollouts, and real-time on-robot inference to seamlessly share a single, elastic fleet.
Act as Principal Executive Interface: Serve as the primary commercial and strategic bridge to NVIDIA, AMD, hyperscalers, and frontier silicon vendors.

Qualifications:

10+ years of engineering leadership experience in large-scale distributed systems, infrastructure, or technical supply chain, with a proven track record of leading compute platform strategy at a frontier AI lab, hyperscaler, or major autonomy program.
Deep technical & commercial fluency in high-performance cluster topology, high-speed interconnects (InfiniBand/RoCE), large-scale data systems, and the economics of distributed training architectures.
Direct operational oversight of 10k+ accelerator environments in high-performance production settings.

Preferred qualifications:

Scale Credentials: Experience orchestrating capital or infrastructure for training runs at the >100B-parameter or >100k-GPU-day scale.
Robotics/Autonomy Context: Familiarity with the unique capacity and latency demands of edge-to-cloud inference and real-time autonomous systems.

The Role

What You'll Do

Architect Multi-Year Compute Strategy: Lead capacity planning, global vendor and cloud partnerships, on-prem vs. cloud mix, and accelerator supply chain roadmaps (H/B-series GPUs, custom silicon evaluation).
Direct the Platform Org: Provide strategic leadership to our infrastructure, distributed systems, and datacenter operations teams—scaling the organization to support next-generation compute demands.
Maximize Fleet Utilization: Oversee the architectural efficiency of our cluster configurations to deliver >50% Model Flops Utilization (MFU) on flagship training runs.
Command a Megawatt Budget: Negotiate, secure, and operate our largest-scale capital deployments for compute infrastructure, partnering directly with Finance to optimize unit economics and risk management.
Unify Global Capacity: Champion the platform strategy that enables world-model training, heavy simulation rollouts, and real-time on-robot inference to seamlessly share a single, elastic fleet.
Act as Principal Executive Interface: Serve as the primary commercial and strategic bridge to NVIDIA, AMD, hyperscalers, and frontier silicon vendors.

Qualifications:

10+ years of engineering leadership experience in large-scale distributed systems, infrastructure, or technical supply chain, with a proven track record of leading compute platform strategy at a frontier AI lab, hyperscaler, or major autonomy program.
Deep technical & commercial fluency in high-performance cluster topology, high-speed interconnects (InfiniBand/RoCE), large-scale data systems, and the economics of distributed training architectures.
Direct operational oversight of 10k+ accelerator environments in high-performance production settings.

Preferred qualifications:

Scale Credentials: Experience orchestrating capital or infrastructure for training runs at the >100B-parameter or >100k-GPU-day scale.
Robotics/Autonomy Context: Familiarity with the unique capacity and latency demands of edge-to-cloud inference and real-time autonomous systems.