Staff Software Engineer, Storage

Crusoe · Data AI · San Francisco, CA - US · Cloud Engineering

Staff Software Engineer focused on architecting and scaling storage infrastructure for AI workloads. The role involves defining the long-term technical strategy for Crusoe's storage engine, leveraging system programming expertise in C, C++, Go, and/or Rust, and implementing solutions using industry-standard storage protocols. Responsibilities include deep performance engineering, open-source contributions, and technical authority on critical architecture decisions. Bonus points for public cloud and AI/ML framework familiarity.

What you'd actually do

  1. Define and drive the long-term technical strategy for Crusoe’s storage engine.
  2. Leverage proven experience in system programming with languages such as C, C++, Go, and/or Rust to build the foundations of our V2 storage re-architecture.
  3. Architect and implement solutions utilizing industry-standard storage protocols such as NFS, SMB, iSCSI, and NVMe/TCP.
  4. Drive and maintain a track record of contributions to the open-source community (e.g., Ceph, GlusterFS, Lustre, Spectrum Scale, OpenEBS).
  5. Serve as the final arbiter for critical architecture decisions across the Foundations organization.

Skills

Required

  • C, C++, Go, and/or Rust
  • NFS, SMB, iSCSI, and NVMe/TCP
  • Ceph, GlusterFS, Lustre, Spectrum Scale, OpenEBS
  • distributed cloud computing infrastructure products
  • Troubleshooting and performance tuning skills
  • distributed state and data protection at petabyte scale
  • professional software engineering practices for the full SDLC

Nice to have

  • AWS, GCP, Azure, OCI
  • PyTorch, Tensorflow, JAX
  • DAOS or SPDK
  • RDMA
  • SmartNICs and RoCEv2
  • Cassandra, MongoDB, Redis, or Kafka
  • CAP Theorem, Paxos/RAFT, consistent hashing, and sharding strategies
  • Masters or PhD in Computer Science, Engineering, or a related field

What the JD emphasized

  • primary architect and visionary for our storage strategy
  • multi-year technical roadmap
  • architectural strategy, integrity and global scalability
  • AI-scale infrastructure
  • physics of the stack
  • V2 storage re-architecture
  • Deep Performance Engineering
  • kernel-level IO context switching
  • global tail-latency
  • 12+ years of experience building and operating large-scale, complex distributed cloud computing infrastructure products
  • entire IO path
  • petabyte scale