Senior Site Reliability Engineer

Block Block · Fintech · NSW, Australia · Remote · 10402 Engineering - Product Platform Engineering

This Senior Site Reliability Engineer role focuses on improving the reliability of Block's platform and critical infrastructure. The role involves building and extending platforms, standardizing reliability tools, leading incident response, and driving platform-wide reliability improvements. A key aspect is leveraging and improving AI-driven tooling for observability, incident analysis, alert tuning, and operational workflows to reduce toil and accelerate problem-solving. The role requires oncall participation and experience with high-availability systems, CI/CD, and monitoring.

What you'd actually do

  1. Build and extend platforms to improve system reliability
  2. Work on team goals that encompass reliability for the entire company
  3. Standardize reliability tools across multiple platforms and organizations
  4. Triage, coordinate, and lead stabilization of sev 0–1 incidents
  5. Serve as primary oncall, maintaining structured escalation paths and exercising leadership escalation

Skills

Required

  • Experience running production oncall for high-availability systems
  • Strong incident management skills — structured triage, mitigation under pressure, blameless postmortems
  • Fluency with CI/CD pipelines, progressive rollout strategies, and rollback automation
  • Monitoring & observability expertise — building/tuning alerts for uptime, error rates, latency regression, and resource exhaustion
  • 5+ years of software development experience

Nice to have

  • Familiarity with AI-driven tooling for observability, incident analysis, or automation
  • A mindset that naturally reaches for AI to accelerate problem-solving and reduce toil
  • Ability to create and maintain evidence-based maturity assessments using trailing 90-day data windows.
  • Comfort with vendor/dependency management — maintaining validated escalation contacts reachable within ≤ 5 minutes.

What the JD emphasized

  • AI-driven tooling for observability, incident analysis, or automation
  • mindset that naturally reaches for AI to accelerate problem-solving and reduce toil