Staff Software Engineer, Cape

Crusoe · Data AI · San Francisco, CA - US · Cloud Engineering

This role is for a Staff Software Engineer focused on building and operating cloud infrastructure management systems and platforms for an AI infrastructure company. The engineer will be responsible for end-to-end use cases and workflows, ensuring reliability, scalability, and operational efficiency to support AI workloads. While the company is AI-focused, the role itself is in infrastructure engineering, not directly building AI models or agents.

What you'd actually do

  1. Collaborate extensively across teams to architect, design, and implement physical infrastructure management software systems, availability platforms, and frameworks that meet the end-to-end needs of customers hosted on our AI infrastructure.
  2. Champion the reliability, scalability, and security of our systems and platforms.
  3. Develop workflows that drive efficiency and meet key business objectives and metrics.
  4. Design and implement high-performing, highly available cloud architectures optimized for both performance and cost-effectiveness.
  5. Streamline cloud deployment, configuration management, and operations by developing and maintaining effective platforms, interfaces, and automation tooling.

Skills

Required

  • Bachelor's degree in Computer Science or Software Engineering
  • 10+ years of relevant industry experience
  • 10+ years of experience building and operating distributed systems at scale
  • Proven experience with building reliable, scalable, efficient, and secure cloud platforms and systems and effectively running them in production environments
  • Fluency in one or more programming languages such as Go, Rust, Java, or C++
  • A collaborative, platform-first mindset
  • Solid understanding of cloud security best practices and the ability to implement secure configurations
  • Excellent troubleshooting and problem-solving skills
  • Strong written and verbal communication skills

Nice to have

  • Hands-on experience deploying, managing, and troubleshooting Kubernetes clusters
  • Experience working in a fast-paced startup environment
  • A passion for building energy-efficient, scalable AI infrastructure
  • Enthusiasm for sustainability and clean energy innovation

What the JD emphasized

  • 10+ years of relevant industry experience
  • 10+ years of experience building and operating distributed systems at scale
  • Proven experience with building reliable, scalable, efficient, and secure cloud platforms and systems and effectively running them in production environments