Staff Storage Engineer

Crusoe · Data AI · San Francisco, CA - US · Cloud Engineering

Staff Storage Engineer responsible for the architecture and operation of the data layer for an AI cloud. This role involves managing the end-to-end lifecycle of storage environments, performance analysis and optimization, validation and testing of new storage technologies, and influencing vendor roadmaps to support AI training and inference workloads.

What you'd actually do

  1. Evaluate performance of block, file, and object storage systems across diverse workloads.
  2. Design and execute Proof of Concept (PoC) exercises to take new arrays through their paces.
  3. Own the initial bring-up, configuration, and ongoing performance tuning of large enterprise arrays.
  4. Collaborate with the Compute and Networking teams to build a seamless "gold standard" cloud infrastructure.
  5. Lead the technical evaluation of new storage technologies.

Skills

Required

  • 10+ years of experience in storage systems administration with a heavy focus on petabyte-scale, on-premise data environments.
  • Strong understanding of storage architectures (block, file, object) and I/O paths.
  • Hands‑on experience with performance benchmarking and observability tools (FIO, ElBencho, blktrace, nvme-cli,nfs-gaze, eBPF, etc.).
  • Experience with SSDs, NVMe, RAID, caching, or distributed storage systems.
  • Deep familiarity with enterprise flash arrays and distributed file systems.
  • Proficiency with scripting (Python, Go or bash) to automate array management and monitoring.
  • Ability to analyze complex performance data and present clear conclusions.
  • Proven ability to lead the authoring of technical requirements, evaluating RFP responses and managing complex vendor relationships.
  • Experience with system design for specific I/O use cases (AI training/inference) and a disciplined approach to testing and validating new vendor releases.

Nice to have

  • Experience with RDMA, iSCSI, NVME-oF, RoCEv2 or InfiniBand networking as it relates to high-performance storage.
  • Previous experience at a major Cloud Service Provider (CSP) or a high-scale AI infrastructure company.
  • Familiarity with distributed storage systems (Ceph, Lustre, Gluster, etc.).
  • Specific experience with VAST Data, Pure Storage (Everpure) is highly preferred.

What the JD emphasized

  • petabyte-scale, on-premise data environments
  • AI training/inference I/O patterns
  • authoring RFPs, reviewing vendor responses and managing complex vendor relationships