Senior Staff Software Engineer, Managed Orchestration

Crusoe · Data AI · San Francisco, CA - US · Cloud Engineering

Senior Staff Software Engineer focused on leading architectural initiatives for cloud software, specifically managing Kubernetes and AI training clusters, with a focus on scalability, reliability, and efficiency. The role involves defining technical direction, establishing best practices, and influencing platform-level decisions for AI infrastructure.

What you'd actually do

  1. Drive the development of scalable, resilient, and high-performance software solutions, ensuring alignment with and influence over the strategic objectives outlined in the Crusoe Cloud roadmap
  2. Provide technical leadership across multiple teams, fostering a culture of innovation, engineering excellence, and accountability while enabling teams to deliver cutting-edge cloud solutions
  3. Define and evolve architectural standards and best practices, ensuring consistency, scalability, and long-term maintainability across systems
  4. Continuously stay ahead of emerging trends and technologies in cloud software, proactively shaping Crusoe’s technical direction and incorporating innovations that maintain competitive advantage
  5. Act as a mentor and multiplier for engineering talent, elevating team capabilities through coaching, design reviews, and thought leadership in technical discussions

Skills

Required

  • 10+ years of experience working in software engineering
  • deep expertise in Systems Engineering and large-scale distributed systems
  • 3+ years of programming experience in GoLang
  • extensive experience with Kubernetes and Linux Engineering
  • highly skilled in infrastructure as code
  • strong understanding of complex systems-level challenges at scale
  • experience with Terraform
  • strong understanding of Argo, CI/CD, and Automated Testing pipelines
  • architect, build, and evolve Kubernetes operators and controllers
  • experience designing and operating large-scale systems comparable to leading services like Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS)
  • lead and deliver critical, high-impact projects
  • define and own system architecture end-to-end
  • exceptional communication skills

Nice to have

  • GCP
  • scaling them for large organizations

What the JD emphasized

  • deep expertise in orchestration and optimization
  • advanced debugging and performance optimization
  • complex systems-level challenges at scale
  • architect, build, and evolve Kubernetes operators and controllers
  • designing and operating large-scale systems comparable to leading services like Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS)
  • critical, high-impact projects
  • system architecture end-to-end