Staff Software Engineer, Devprod (infrastructure Observability)

Temporal · Enterprise · United States · DevProd

Staff Software Engineer on the Infrastructure Team with a focus on Observability, responsible for driving the productivity and reliability of Temporal's core platforms. This role involves leading the end-to-end Software Development Lifecycle for distributed systems, designing and building scalable observability solutions, and participating in the on-call rotation.

What you'd actually do

  1. Lead the end-to-end Software Development Lifecycle: goals & requirements solicitation, design & review, implementation, operationalization & deployment, support & maintenance.
  2. Lead feature design, review with stakeholders, iterate to incorporate feedback and drive consensus.
  3. Clearly document design choices and operational knowledge to successfully deploy and manage the software you develop.
  4. Provide appropriate test and production readiness coverage for unit, integration, and performance of your feature ownership area.
  5. Set a high bar for technical excellence and take pride in the software you develop.

Skills

Required

  • Go
  • Kubernetes
  • SQL
  • AWS
  • GCP
  • computer architecture
  • operating systems
  • networking
  • monitoring
  • instrumenting
  • configuring infrastructure

Nice to have

  • Clickhouse
  • Prometheus
  • Grafana
  • Loki
  • Thanos
  • Temporal

What the JD emphasized

  • end-to-end Software Development Lifecycle
  • design and build multi-component, distributed systems that operate at scale
  • Investigate issues with a methodical approach to identify a root cause
  • Understand performance and reliability implications of design options at scale
  • Expert-level knowledge of architecture and services of assigned domain
  • Strong command over all aspects of the Temporal ecosystem
  • Demonstrated ability to develop horizontally scalable, resilient, and high performance distributed systems in a production environment
  • Experience designing, implementing, deploying, and supporting large scale, geographically distributed observability and/or high throughput data streaming/processing pipelines, or similar
  • Expert-level Kubernetes skills