Senior Manager, Data Center Operations

Crusoe · Data AI · Sunnyvale, CA - US · Data Center Operations (DIG)

This role is for a Senior Manager of Data Center Operations responsible for the operational anchor of Crusoe's initial deployment in Sparks, Nevada, and overseeing the West Coast Regional operations, including an AI Lab in San Jose, California. The role involves defining, building, and reporting on KPIs, overseeing specialized electrical and mechanical infrastructure, managing hardware lifecycle, and leading a team of technicians. While the company is an AI infrastructure company, the role itself is focused on data center operations and infrastructure, not direct AI/ML model development or deployment.

What you'd actually do

  1. Design and implement a robust framework of Key Performance Indicators (KPIs) from scratch. You will define and track metrics for uptime, MTTR (Mean Time to Repair), deployment velocity, and power utilization, providing data-driven updates to executive leadership.
  2. Act as the technical lead for the "white space" while maintaining a deep understanding of the specialized electrical and mechanical systems (UPS, PDUs, specialized cooling) that support our unique Sparks deployments.
  3. Lead the operational rollout for Crusoe Sparks (NV) and the San Jose Lab (CA). Develop the roadmap for scaling operations as the West Coast Region expands.
  4. Oversee the day-to-day maintenance of AI-optimized hardware. Drive rapid diagnostics, component replacement (GPU trays, DIMMs, etc.), and streamlined RMA processes across the region.
  5. Bridge the gap between the San Jose Lab and our Crusoe Cloujd production sites. Document deployment standards that allow seamless hardware transitions from experimental lab phases to large-scale production.

Skills

Required

  • 8+ years in data center operations, managing distributed white space or lab environments across multiple locations.
  • strong technical understanding of data center electrical and mechanical systems.
  • demonstrated experience defining and building operational metrics.
  • hands-on experience with enterprise-grade server architecture
  • experience operating in colocation or leased-space environments.
  • willingness to travel between Crusoe Cloud data center locations as needed

Nice to have

  • specific experience with GPU-heavy clusters (NVIDIA/AMD) is highly preferred.

What the JD emphasized

  • defining, building, and reporting on the KPIs
  • defining and building operational metrics
  • GPU-heavy clusters