Senior Engineering Manager, Cloud Storage

Crusoe · Data AI · San Francisco, CA - US · Cloud Engineering

Crusoe is seeking a Senior Engineering Manager to lead a team responsible for the architecture, delivery, and operations of their high-performance AI cloud storage platform. The role involves managing engineers, translating product requirements into technical plans, providing technical oversight, and collaborating cross-functionally to ensure the storage infrastructure meets the demands of AI and HPC workloads.

What you'd actually do

  1. Lead and grow a team of software engineers spanning all levels with a focus on career development, performance management, and building a high-performing engineering culture
  2. Clearly translate product requirements into actionable engineering plans, from architecture and design through task breakdown, estimation, and delivery
  3. Lead architecture and design reviews, providing substantive technical input and holding the bar for quality across your team's work
  4. Serve as the primary technical representative for Storage in cross-functional initiatives, ensuring your team's needs and constraints are well-understood by partner teams and leadership
  5. Develop a strong understanding of the full storage stack from NAND to GPU, focusing on performance characteristics and submillisecond latency required by our AI/ML customer access patterns.

Skills

Required

  • 4+ years of engineering management experience
  • Experience leading teams of 6+ engineers at multiple levels
  • background as a strong individual contributor — software engineer, tech lead, or architect who moved into management
  • 4+ years of hands-on experience with distributed storage systems such as Ceph, GlusterFS, or MinIO
  • Hands on experience with production block, file, and object storage systems
  • Experience with performance optimization and capacity planning for large-scale storage systems
  • Deep working knowledge of high performance storage protocols including NFS over RDMA, NVMe-oF, S3, and RoCE/InfiniBand fabrics
  • Familiarity with modern storage technologies (NVMe, RDMA, DPUs) and their impact on system performance, including kernel level concepts around I/O subsystems and volumes
  • Strong programming methodologies and background in C/C++, Go, Java, or Rust

Nice to have

  • career development
  • performance management
  • building a high-performing engineering culture
  • structured performance reviews
  • individualized growth paths
  • ownership is distributed
  • calculated risk-taking is encouraged
  • engineers are empowered to solve hard problems
  • recruiting and hiring
  • product requirements
  • actionable engineering plans
  • architecture and design
  • task breakdown
  • estimation
  • delivery
  • dependencies
  • resourcing
  • sequencing
  • parallel workstreams
  • partner teams
  • external vendors
  • storage integrates cleanly
  • broader Crusoe Cloud platform
  • architecture and design reviews
  • substantive technical input
  • holding the bar for quality
  • technical depth
  • meaningful code reviews
  • senior engineers
  • complex design decisions
  • incident response
  • storage-related issues
  • fast resolution
  • durable fixes
  • business-critical systems
  • full storage stack from NAND to GPU
  • performance characteristics
  • submillisecond latency
  • AI/ML customer access patterns
  • primary technical representative for Storage
  • cross-functional initiatives
  • team's needs and constraints
  • partner teams and leadership
  • strong working relationships
  • Compute, SDN, SRE, and CCX teams
  • storage is a reliable, well-integrated part of the overall platform
  • hardware vendors
  • enterprise customers
  • performance issues
  • next-generation hardware-software design
  • developing people
  • scaling impact through your team
  • managing performance clearly and constructively
  • navigating difficult people situations
  • sound judgment
  • good process
  • making decisions with incomplete information
  • balancing near-term execution with longer-term technical quality
  • Adaptable in your technical approach

What the JD emphasized

  • high performance AI cloud storage platform
  • petabyte scale infrastructure
  • next generation, bespoke cloud storage platform optimized for AI and HPC workloads
  • submillisecond latency required by our AI/ML customer access patterns
  • 4+ years of hands-on experience with distributed storage systems such as Ceph, GlusterFS, or MinIO
  • production block, file, and object storage systems
  • performance optimization and capacity planning for large-scale storage systems
  • high performance storage protocols including NFS over RDMA, NVMe-oF, S3, and RoCE/InfiniBand fabrics
  • kernel level concepts around I/O subsystems and volumes
  • cloud native storage solutions