Sr. Site Reliability Engineer

PitchBook PitchBook · Fintech · Seattle, WA · Technology Operations

PitchBook is seeking a Sr. Site Reliability Engineer to build and evolve systems for reliable and consistent operation of their product suite. The role involves defining and achieving service level objectives (SLOs), deploying and managing production systems, incorporating observability tools, performing incident management, and mentoring other engineers. The position requires strong skills in Kubernetes, Java, observability, event-driven architecture, and Python, with experience in cloud-native patterns and building internal developer tools. The role also involves designing developer experience tooling and AI-leveraged workflows.

What you'd actually do

  1. Build and maintain internal platform services, Kubernetes operators, and observability tooling that support enterprise reliability at scale
  2. Design and deliver developer experience tooling and AI-leveraged workflows through Claude Code plugins and MCP servers
  3. Establish service level objectives (SLOs), error budgets, and service level indicators (SLIs) as success criteria that our systems and processes consistently meet or exceed these targets
  4. Build recoverability into our services and systems, including disaster recovery (DR), backups/recovery, and incorporation of multi-AZ multi-regionality into cloud constructs
  5. Build observability systems and services (monitoring, telemetry, tracing) for reuse in our platform architecture, creating alerting for fault identification and building dashboards for metrics

Skills

Required

  • Kubernetes
  • Java (Spring Boot)
  • observability
  • event-driven architecture
  • Python
  • Clean Architecture
  • cloud-native deployment patterns
  • building internal tools
  • Linux/UNIX-based systems
  • cloud environments (GCP & AWS)
  • Reliability Engineering
  • DevOps
  • infrastructure-as-code tools (Terraform, Puppet, Ansible, Chef)

Nice to have

  • Master's degree

What the JD emphasized

  • Requires strong knowledge of Kubernetes, Java (Spring Boot), observability, event-driven architecture, and Python; experience with Clean Architecture, cloud-native deployment patterns, and building internal tools that improve developer productivity across engineering organizations.