Staff Software Engineer, Managed Orchestration (managed Kubernetes)

Crusoe · Data AI · San Francisco, CA - US · Cloud Engineering

This role is for a Staff Software Engineer focused on cloud software and infrastructure, specifically managing and scaling Kubernetes and AI training clusters. The role involves writing code, contributing to architecture, evaluating tools, and ensuring the reliability and performance of the infrastructure. While the company operates in the AI infrastructure space, the core responsibilities of this specific role are centered around software engineering and infrastructure management, not direct AI/ML model development or research.

What you'd actually do

  1. Contribute to the development of scalable and robust software solutions, closely aligning with the strategic objectives outlined in the Crusoe Cloud roadmap
  2. Work collaboratively with tech leads and engineers to create a dynamic environment where creativity and technical excellence are encouraged, leading to the development of cutting-edge cloud solutions
  3. Continuously stay abreast of the latest trends and techniques in cloud software, incorporating these insights to keep Crusoe’s offerings innovative
  4. While you won’t have formal management responsibilities, you will support the development of your peers by sharing knowledge and providing guidance in technical discussions

Skills

Required

  • GoLang
  • Kubernetes
  • Linux Engineering
  • Infrastructure as Code
  • Argo
  • CI/CD
  • Automated Testing
  • Kubernetes operators and controllers
  • System Architecture
  • GCP
  • Terraform

Nice to have

  • GCP
  • Terraform

What the JD emphasized

  • 8+ years of experience working in software engineering
  • 2+ years of programming experience in GoLang
  • experience with Kubernetes and Linux Engineering and debugging
  • skilled in infrastructure as code
  • experience with Terraform and GCP (preferred)
  • understand Argo, CI/CD, and Automated Testing pipelines
  • build and manage Kubernetes operators and controllers
  • develop scalable systems to compete with leading services like Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS)
  • oversee critical projects with broad impact
  • design system architecture
  • excellent communication skills, both verbal and written