Senior Software Engineer, Managed Orchestration (managed Kubernetes)

Crusoe · Data AI · San Francisco, CA - US · Cloud Engineering

This role is for a Senior Software Engineer focused on cloud software and infrastructure, specifically managing and scaling Kubernetes and AI training clusters. The role involves writing and reviewing code, evaluating tools, and contributing to system architecture, with a focus on reliability, scalability, and operational costs. While the company operates in the AI infrastructure space, the core responsibilities of this specific role are centered on cloud infrastructure engineering and orchestration, not direct AI/ML model development or research.

What you'd actually do

  1. Contribute to the development of scalable and robust software solutions, closely aligning with the strategic objectives outlined in the Crusoe Cloud roadmap
  2. Work collaboratively with tech leads and engineers to create a dynamic environment where creativity and technical excellence are encouraged, leading to the development of cutting-edge cloud solutions
  3. Continuously stay abreast of the latest trends and techniques in cloud software, incorporating these insights to keep Crusoe’s offerings innovative
  4. While you won’t have formal management responsibilities, you will support the development of your peers by sharing knowledge and providing guidance in technical discussions

Skills

Required

  • 5-7 years of experience working in software engineering
  • strong experience in Systems Engineering
  • 2+ years of programming experience in GoLang
  • experience with Kubernetes
  • Linux Engineering and debugging
  • infrastructure as code
  • systems-level challenges
  • Argo
  • CI/CD
  • Automated Testing pipelines
  • build and manage Kubernetes operators and controllers
  • develop scalable systems to compete with leading services like Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS)
  • oversee critical projects with broad impact
  • design system architecture
  • ownership of system architecture, including CI/CD pipelines
  • adherence to security standards
  • excellent communication skills, both verbal and written

Nice to have

  • Terraform
  • GCP

What the JD emphasized

  • AI infrastructure
  • AI training clusters
  • managed Kubernetes
  • AI training clusters