Staff Infrastructure Engineer

Crusoe Crusoe · Data AI · San Francisco, CA - US · Cloud Engineering

Staff Software Infrastructure Engineer responsible for managing and scaling Crusoe's cloud infrastructure, focusing on server provisioning, automation, and transitioning to Kubernetes. The role involves troubleshooting hardware, especially GPUs, and supporting high growth in AI compute operations.

What you'd actually do

  1. Manage and maintain day-to-day operations of Crusoe’s cloud infrastructure.
  2. Develop automation tools to streamline server provisioning and reduce SLA times.
  3. Scale infrastructure to support mass deployments (80-100 servers simultaneously).
  4. Troubleshoot hardware issues, especially with GPUs, and liaise with vendors.
  5. Transition Crusoe’s environment to Kubernetes and containerized workflows.

Skills

Required

  • Solid hardware experience
  • GPU troubleshooting expertise
  • Strong Linux background
  • Knowledge of PXE booting and server provisioning (bare metal)
  • Experience with BMC/IPMI, BIOS, and enterprise-grade server management
  • Kubernetes proficiency (admin or developer)
  • Familiarity with containerization technologies (Docker preferred)
  • Experience with version control systems ( Gitlab )
  • Problem solving skills
  • Strong communication and collaboration skills

Nice to have

  • MAAS
  • Python or Golang
  • Kubernetes administration and deployment experience
  • Ansible and Terraform

What the JD emphasized

  • Infrastructure as code
  • scaling operations
  • high growth
  • Kubernetes
  • GPU troubleshooting