Senior Cloud Performance Engineer- Remote

ClickHouse ClickHouse · Data AI · Cloud Engineering

This role focuses on optimizing the performance and scalability of the ClickHouse Cloud Platform, a distributed database system. Responsibilities include benchmarking, performance analysis, capacity planning, and implementing chaos engineering to ensure system resilience and efficiency. The role requires deep expertise in distributed systems, database performance, and cloud infrastructure.

What you'd actually do

  1. Benchmark system performance, database performance analysis, capacity sizing and optimization.
  2. Recommend configuration tuning/optimizations for performance bottlenecks
  3. Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
  4. Develop, deploy and manage tools to systematically run chaos experiments and measure impact
  5. Observe running systems, and determine/prioritize innovative ways to disrupt them

Skills

Required

  • distributed systems performance engineering
  • database benchmarking
  • test automation
  • system engineering
  • performance analysis
  • capacity management
  • Go, C/C++, Java, or similar
  • concurrency
  • multithreading
  • distributed system architectures
  • cloud infrastructure services
  • Kubernetes
  • public cloud provider (AWS, GCP, Azure)
  • production debugging skills

Nice to have

  • Chaos Engineering

What the JD emphasized

  • proven track record of understanding the performance limits of different distributed databases
  • strong background in database benchmarking, test automation, system engineering, performance analysis, and capacity management
  • 6+ years of relevant software development industry experience building and operating scalable, fault-tolerant, distributed systems
  • Experience leading and shipping large scope technical projects in collaboration with multiple experienced engineers