Staff Software Engineer, Infrastructure - Marimo

Weights & Biases Weights & Biases · Data AI · Bellevue, WA +5 · Remote · Technology

Staff Software Engineer, Infrastructure role focused on building the backend architecture for a cloud-hosted marimo notebook service (molab). The role involves ensuring high availability, low latency, stability, and fraud/abuse prevention, integrating with CoreWeave's Kubernetes clusters and object storage, and optimizing GPU utilization. The product is an open-source environment for data work, with AI-assisted coding mentioned as a feature.

What you'd actually do

  1. co-design and implement the backend architecture of molab
  2. solving for high availability, low latency (both the ability to rapidly spin up and spin down notebook kernels on demand, as well as low latency communication between the notebook frontend and backend), stability, and fraud and abuse
  3. design molab to run on CoreWeave's specialized kubernetes-based clusters and integrate with CoreWeave object storage
  4. solve for keeping utilization of GPUs high

Skills

Required

  • 8+ years of experience in software engineering
  • Strong fundamentals that are language agnostic
  • Expertise in computer systems, including parallel computing (threading, multiprocessing), concurrency (asynchronous programming), networking/inter-process communication
  • Experience with containerization, container orchestration (kubernetes), scheduling, networked filesystems, resource allocation, distributed systems, and cloud infrastructure
  • Experience building highly available, fault-tolerant systems
  • Strong communication skills, written and verbal

Nice to have

  • Proficiency with Python and Python packaging
  • Basic experience with or awareness of the Python stack for AI/ML
  • Empathy for practitioners and researchers in AI, ML, data engineering, NLP, or other quantitative work
  • Experience with GPU resource allocation and sharing

What the JD emphasized

  • high availability
  • low latency
  • fault tolerant systems
  • highly scalable cloud infrastructure

Other signals

  • building the best open-source environment for working with data
  • world-class cloud platform designed for scale, performance, and collaboration
  • high impact team and product, undergoing rapid growth
  • building highly available, low latency and fault tolerant systems
  • co-design and implement the backend architecture of molab
  • solving for high availability, low latency
  • stability, and fraud and abuse
  • design molab to run on CoreWeave's specialized kubernetes-based clusters
  • integrate with CoreWeave object storage
  • solve for keeping utilization of GPUs high