Senior Staff Software Engineer, Cape

Crusoe Crusoe · Data AI · San Francisco, CA - US · Cloud Engineering

This role is for a Senior Staff Software Engineer focused on building and operating cloud infrastructure management systems and platforms for an AI infrastructure company. The engineer will own the technical vision, set engineering direction, and ensure infrastructure scales to meet business demands. Responsibilities include leading architecture and design of physical infrastructure management software, availability platforms, and frameworks, defining engineering standards, driving workflow development, designing cloud architectures, and leading the development of cloud deployment platforms and automation tooling. The role also involves mentoring engineers and collaborating with cross-functional teams.

What you'd actually do

  1. Lead the architecture and design of physical infrastructure management software systems, availability platforms, and frameworks that support end-to-end customer use cases across our AI infrastructure.
  2. Define and uphold engineering standards for reliability, scalability, and security across platforms and systems.
  3. Drive the development of workflows that meet critical business objectives and key revenue metrics.
  4. Own the design of high-performing, highly available cloud architectures that balance performance, resilience, and cost-effectiveness.
  5. Lead the development and maintenance of cloud deployment platforms, configuration management systems, and automation tooling that streamline operations across the organization.

Skills

Required

  • Bachelor's degree in Computer Science or Software Engineering
  • 10+ years of relevant industry experience
  • track record of leading architecture and delivery of large-scale, complex infrastructure systems
  • 10+ years of experience building and operating distributed systems at scale
  • demonstrated ownership of production cloud platforms
  • Deep expertise in designing and delivering reliable, scalable, efficient, and secure cloud systems
  • strong understanding of what it takes to run them effectively in production
  • Fluency in one or more programming languages such as Go, Rust, Java, or C++
  • Exceptional ability to work across teams with a platform mindset
  • Deep understanding of cloud security best practices
  • proven ability to embed security into architecture and design decisions
  • Outstanding troubleshooting and problem-solving skills
  • ability to navigate ambiguity and resolve complex infrastructure challenges
  • Strong leadership and communication skills
  • ability to align stakeholders
  • influence without authority
  • clearly articulate technical vision and tradeoffs

Nice to have

  • Hands-on experience deploying, managing, and troubleshooting Kubernetes clusters at scale
  • Significant experience in fast-moving, high-growth startup environments
  • A passion for building energy-efficient, scalable AI infrastructure that pushes the boundaries of what cloud platforms can do.
  • Enthusiasm for sustainability, clean energy, and Crusoe's mission to revolutionize how AI computing is powered.

What the JD emphasized

  • scale to meet the demands of a rapidly growing business
  • scale Crusoe Cloud by 10X and beyond
  • large-scale, complex infrastructure systems
  • distributed systems at scale
  • run them effectively in production
  • building systems that others can rely on and driving broad adoption across engineering and operations