Senior Software Engineer, Cloud Enablement

Temporal · Enterprise · United States · Cloud Global Services

Senior Software Engineer to join the Cloud Enablement team, part of Temporal’s Cloud Global Services (CGS) organization. The Cloud Enablement team focuses on applying and extending the Temporal OSS replication stack to power critical Temporal Cloud capabilities. These include High Availability (HA) namespaces, error detection and automated failover, and migration of workloads and namespaces between self-hosted Temporal clusters and Temporal Cloud, as well as within Temporal Cloud. As a Senior Engineer, you will work on backend systems that sit at the core of Temporal Cloud’s enterprise offerings. These systems must be correct, reliable, observable, and safe to operate at scale, even in the presence of partial failures, network partitions, and evolving customer workloads. You’ll collaborate closely with other engineers in CGS Replication Foundations, Cloud, Infrastructure, and OSS teams to deliver production-grade features used by customers running mission-critical workflows.

What you'd actually do

  1. Design and implement backend features that apply and extend the Temporal OSS replication stack to new Temporal Cloud capabilities
  2. Contribute to Temporal Cloud High Availability features, including:
  3. Build and improve namespace migration systems, including:
  4. Own medium-to-large features end-to-end, from design through production rollout and long-term maintenance
  5. Write clear design documentation describing system behavior, tradeoffs, and failure modes

Skills

Required

  • Strong experience designing and building distributed backend systems with a focus on reliability and scalability
  • Hands-on experience operating production systems, including debugging failures and improving observability
  • Experience developing highly concurrent systems
  • Demonstrated ability to write concurrent production code, preferably in Go (Java or similar languages also welcome)
  • Solid understanding of failure modes, replication, and resiliency patterns in distributed systems
  • Ability to independently drive work from problem definition to delivery, while collaborating closely with peers and stakeholders
  • A mindset focused on building systems that are safe to operate, easy to reason about, and resilient to change

Nice to have

  • Experience with replication, failover, or disaster recovery systems
  • Experience designing or operating migration tooling for distributed systems
  • Familiarity with cloud infrastructure and containerized environments (e.g., Kubernetes)

What the JD emphasized

  • correct, reliable, observable, and safe to operate at scale