Sr. Engineer, Storage

Weights & Biases Weights & Biases · Data AI · Bellevue, WA +4 · Technology

CoreWeave is seeking a Sr. Engineer, Storage to design and implement distributed storage solutions for AI workloads. This role involves working with exabyte-scale object storage, distributed filesystems, and optimizing performance and reliability using technologies like RDMA, GPU Direct Storage, NFS, and FUSE. The engineer will also lead efforts in reliability, security, observability, and collaborate with various teams to ensure seamless storage capabilities. Experience with AI tools for software development and storage observability tools is required.

What you'd actually do

  1. Design and Implement distributed storage solutions to support scaling data intensive AI workloads.
  2. Contribute to the development of exabyte-scale, S3-compatible object storage and integrate dedicated storage clusters into diverse customer environments.
  3. Work with technologies such as RDMA, GPU Direct Storage, and distributed filesystems protocols such as NFS or FUSE to optimize storage performance and efficiency.
  4. Lead efforts to improve the reliability, durability, security, and observability of our storage stack.
  5. Collaborate with operations teams to monitor, troubleshoot, and improve storage systems in production environments.

Skills

Required

  • distributed storage solutions
  • exabyte-scale object storage
  • S3-compatible object storage
  • distributed filesystems
  • RDMA
  • GPU Direct Storage
  • NFS
  • FUSE
  • storage reliability
  • storage durability
  • storage security
  • storage observability
  • telemetry pipelines
  • ClickHouse
  • Prometheus
  • Grafana
  • cloud-native infrastructure
  • Kubernetes
  • scalable system architectures
  • Go
  • C
  • Rust
  • AI tools to augment software development

Nice to have

  • Ceph
  • DAOS
  • storage systems engineering
  • infrastructure

What the JD emphasized

  • storage systems engineering
  • object storage
  • distributed filesystems
  • S3
  • NFS
  • RDMA
  • GPU Direct Storage
  • distributed storage
  • AI workloads