Senior Engineering Manager, Compute

Crusoe · Data AI · San Francisco, CA - US · Cloud Engineering

This role manages engineers building and optimizing the compute infrastructure for AI workloads, focusing on virtualization, bare-metal provisioning, and hypervisor tuning. The team's work directly impacts the performance and cost-efficiency of AI compute for enterprises and researchers, aiming to scale a specialized cloud from the ground up.

What you'd actually do

  1. Hire, mentor, and scale a world-class team of engineers. You will define performance expectations, foster a culture of technical excellence, and build career growth paths for your direct reports.
  2. Lead the development and optimization of Crusoe’s compute stack, from bare-metal orchestration to hypervisor tuning (KVM/QEMU) and kernel subsystems (NUMA, memory management, scheduling).
  3. Collaborate with hardware and networking teams to optimize performance for massive GPU/TPU clusters, SmartNICs, and high-speed interconnects.
  4. Oversee the reliability and scalability of our compute services. You will guide the team through complex distributed systems challenges and ensure high availability across our global data center footprint.
  5. Partner with Product, Infrastructure, and Site Reliability Engineering (SRE) to define and execute a roadmap that balances rapid innovation with the stability of a "gold standard" cloud provider.

Skills

Required

  • 5+ years of experience in engineering management
  • leading teams that build distributed systems, cloud infrastructure, or high-performance computing platforms
  • systems programming (Go, C/C++, or Rust)
  • deep understanding of Linux internals and virtualization technologies
  • Proven ability to lead teams through ambiguity and deliver mission-critical software in a fast-paced, high-growth environment
  • Strategic Mindset: You can bridge the gap between low-level technical trade-offs and high-level business goals, clearly communicating complex concepts to stakeholders.

Nice to have

  • Experience at a major Cloud Service Provider (CSP) or in a high-scale AI infrastructure company.
  • Familiarity with GPU-based workloads, InfiniBand, or RoCE networking.
  • Contributions to open-source projects in the Linux kernel or virtualization space.

What the JD emphasized

  • high-performance hardware
  • cloud-native software
  • GPU clusters
  • AI and HPC workloads
  • virtualization
  • bare-metal provisioning
  • kernel-level optimization
  • VM as a Service
  • Cloud Hypervisor Development
  • Open Source contributions
  • performance-per-dollar
  • AI Enterprises
  • sustainable, hyperscale compute power
  • systems programming
  • Linux internals
  • virtualization technologies
  • mission-critical software
  • fast-paced, high-growth environment
  • low-level technical trade-offs
  • high-level business goals
  • AI revolution