Staff Software Engineer, Compute (temporal Cloud)

Temporal · Enterprise · United States · Compute

Staff Software Engineer focused on building managed compute primitives for Temporal Cloud, which powers durable execution for AI and enterprise systems. The role involves designing autoscaling systems, ensuring multi-tenant safety, and delivering production-grade observability for compute services.

What you'd actually do

  1. Create new managed compute primitives that feel first-class in Temporal Cloud: crisp abstractions, clean APIs, and an extension story across compute providers.
  2. Design self-optimizing autoscaling systems (signals, backstops, debouncing, guardrails) that scale worker fleets safely and predictably.
  3. Define the Open Source Server ↔ Cloud boundary for compute capabilities, keeping the architecture cohesive and maintainable.
  4. Architect, build, and operate services on the hot path of task execution where performance and correctness are customer-visible.
  5. Deliver real-world cloud integrations (e.g., customer-account execution): IAM boundaries, secure credentials handling, networking constraints, quotas, and failure modes.

Skills

Required

  • Significant experience building distributed systems or multi-tenant platform services (design, implementation, and production operations).
  • Strong systems fundamentals: concurrency, performance, reliability, and failure-mode thinking.
  • A record of shipping platform primitives used by other engineers/customers (APIs, control planes, data planes).
  • Comfort owning outcomes: SLOs, incident response, and improving on-call quality over time.
  • Excellent written communication and crisp tradeoff thinking.

Nice to have

  • Go experience is a plus
  • Experience building cloud infrastructure platforms.
  • Experience with IAM/security boundaries for cross-account execution models.
  • Having built Kubernetes controllers / CRDs or heterogeneous worker fleet operations

What the JD emphasized

  • multi-cloud
  • multi-tenant platforms
  • cloud infrastructure
  • autoscaling systems
  • multi-tenant safety
  • production-grade observability
  • customer-account execution
  • IAM boundaries
  • secure credentials handling
  • networking constraints
  • quotas
  • failure modes