Engineering Manager - Sre

Booking Booking · Hospitality · Amsterdam, Netherlands · Engineering

Engineering Manager for Site Reliability Engineering (SRE) team within Core Platforms at Booking.com. The role focuses on owning the strategy and delivery of storage services, driving reliability, efficiency, security, and developer experience across on-prem and cloud environments. Responsibilities include coaching and growing an SRE team, setting execution standards, and partnering with stakeholders to improve availability, latency, and scalability of mission-critical systems.

What you'd actually do

  1. Lead, mentor, and grow a high‑performing SRE team; set goals, run performance and craft coaching, and foster an inclusive, feedback‑rich culture.
  2. Own the reliability roadmap for server and storage services (availability, performance, scalability, latency), balancing near‑term delivery with long‑term platform health.
  3. Drive engineering excellence: SLIs/SLOs and error budgets, incident response/postmortems, capacity planning, and reliability reviews across owned services.
  4. Partner with platform, networking, security, and product stakeholders to align on reliability requirements and unblock delivery across dependencies.
  5. Advocate for pragmatic automation and “operations as software”—reducing toil and improving operability through robust tooling, documentation, and guardrails.

Skills

Required

  • Experience as an Engineering Manager (or equivalent tech leadership role) leading SRE/Platform teams that run large-scale distributed systems in production.
  • Strong SRE craft across SLIs/SLOs, incident management, capacity planning, resiliency patterns, and reliability-focused software engineering and automation.
  • Hands-on familiarity partnering on or guiding work across Kubernetes, service mesh, and AWS-based workloads; comfort steering trade‑offs for reliability and velocity.
  • Solid grounding in Linux systems, networking, and observability practices (metrics, logs, tracing) applied to high‑throughput, highly available infrastructure.
  • Excellent stakeholder management, prioritization, and communication skills; capable of aligning multiple teams and leading through influence.
  • Experience with automated provisioning on private/public clouds and bare metal.
  • Experience with large storage devices in a business critical environment.
  • Experience with configuration management systems (eg puppet, ansible, chef).
  • Experience with physical server management: hardware features and performance attestation, BMC/ILO automation, vendor relationships and customer support escalations.

What the JD emphasized

  • large-scale distributed systems
  • large storage devices