Database Reliability Engineer - Core Team

ClickHouse ClickHouse · Data AI · EMEA · Engineering

Database Reliability Engineer for ClickHouse Core team, focusing on improving reliability, performance, and availability of the ClickHouse database. Responsibilities include monitoring, incident management, post-mortem analysis, and bug fixing. Requires experience with distributed databases, scripting (Python/Shell), and cloud platforms.

What you'd actually do

  1. Continuously improve the reliability and performance of ClickHouse core.
  2. Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers.
  3. Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements.
  4. Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers.
  5. Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities.

Skills

Required

  • Reliability Engineering
  • QA
  • customer facing engineering
  • operating ClickHouse or other SQL databases in production
  • distributed database internals
  • SQL
  • Shell scripting
  • Python scripting
  • C++ code understanding
  • AWS
  • Azure
  • Google Cloud Platform
  • problem-solving
  • production debugging skills
  • fast-paced environment
  • global team
  • ownership
  • accountability
  • communication skills

Nice to have

  • ClickHouse experience
  • distributed database internals
  • SQL

What the JD emphasized

  • reliability
  • performance
  • availability
  • scalability
  • incident response
  • post-mortem analysis
  • debugging skills