Staff Software Engineer (backend Engineer + Streaming)

Crusoe · Data AI · Sunnyvale, CA - US · Cloud Engineering

Staff Streaming Software Engineer for the Observability team, responsible for building and operating real-time data platforms (metrics, logs, traces, event streams) for AI cloud infrastructure. Role involves defining technical strategy, leading architectural design for high-throughput streaming systems, and ensuring reliability and scalability.

What you'd actually do

  1. Defining the technical strategy and multi-quarter roadmap for streaming infrastructure that ingests and processes observability data including logs, metrics, traces, and operational events
  2. Leading architectural design for large-scale, high-throughput streaming systems using technologies such as Kafka, Kinesis, Pub/Sub, Flink, or similar platforms—setting standards that scale across teams
  3. Driving solutions to the hardest reliability and scalability challenges: high-cardinality workloads, bursty traffic patterns, cross-region data movement, and fault-tolerant delivery semantics
  4. Partnering with SREs, platform teams, and product engineering to define how streaming data integrates into internal observability tooling and operational workflows company-wide
  5. Establishing engineering best practices around instrumentation, CI/CD, infrastructure-as-code, and incident management for streaming systems

Skills

Required

  • distributed streaming or real-time data platforms
  • Kafka or similar distributed streaming technologies
  • backend languages such as Java, Scala, Go, or Python
  • reliability engineering
  • leading complex projects
  • influencing engineering culture

Nice to have

  • observability platforms at cloud or data center scale
  • stream processing frameworks
  • exactly-once semantics
  • schema management
  • Kubernetes
  • large-scale containerized infrastructure
  • data contracts
  • serialization formats
  • schema registries
  • bare-metal infrastructure
  • large-scale data center environments
  • growing senior engineers into technical leads

What the JD emphasized

  • set the technical direction
  • driving architectural decisions
  • long-term investments
  • deep technical execution
  • organizational influence
  • shaping how teams build and operate streaming infrastructure
  • align platform strategy with company goals
  • define how observability data moves at scale
  • leave a lasting architectural footprint
  • architecting and operating distributed streaming or real-time data platforms at significant scale
  • track record of driving technical decisions that have lasting, cross-team impact
  • shaping how systems are designed
  • operational experience in production at scale
  • reliability engineering
  • lead complex, ambiguous projects
  • influencing engineering culture and technical standards beyond your immediate team
  • long-horizon investments