Senior Software Engineer, Storage Engineer

Weights & Biases Weights & Biases · Data AI · Bellevue, WA +3 · Technology

CoreWeave is seeking a Senior Software Engineer for their Storage Engineer team. This role involves designing and implementing distributed storage solutions to support AI workloads, optimizing performance and efficiency, and improving the reliability and observability of their storage stack. The engineer will work with technologies like RDMA, GPU Direct Storage, and distributed filesystems, and collaborate with various teams to deliver seamless storage capabilities.

What you'd actually do

  1. Design and Implement distributed storage solutions to support scaling data intensive AI workloads.
  2. Contribute to the development of exabyte-scale, S3-compatible object storage and integrate dedicated storage clusters into diverse customer environments.
  3. Work with technologies such as RDMA, GPU Direct Storage, and distributed filesystems protocols such as NFS or FUSE to optimize storage performance and efficiency.
  4. Participate in efforts to improve the reliability, durability, and observability of our storage stack.
  5. Collaborate with operations teams to monitor, troubleshoot, and improve storage systems in production environments.

Skills

Required

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • 4–8 years of experience working in storage systems engineering or infrastructure.
  • Strong hands-on experience with object storage or distributed filesystems in production environments.
  • Experience with one or more storage protocols (e.g. S3, NFS) and file systems such as Ceph, DAOS, or similar.
  • Proficiency in a systems programming language such as Go, C, or Rust.
  • Familiarity with storage observability tools and telemetry pipelines (e.g., ClickHouse, Prometheus, Grafana).
  • Experience working with cloud-native infrastructure, Kubernetes, and scalable system architecture.

Nice to have

  • expert in data persistence on physical media
  • high performance data transfer using RDMA
  • resilient distributed systems

What the JD emphasized

  • support scaling data intensive AI workloads
  • exabyte-scale, S3-compatible object storage
  • optimize storage performance and efficiency
  • improve the reliability, durability, and observability of our storage stack
  • customer workloads