What you'd actually do

Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse.

Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.

Ensure all the infrastructure components in ClickHouse Cloud (including Data Plane, Control Plane,ClickHouse Core, etc) have monitoring and alerting in place to ensure timely detection and resolution of incidents.

Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers.

Continuously improve the reliability and performance of our ClickHouse services.

About ClickHouse

Recognized on the 2025 Forbes Cloud 100 list, ClickHouse is one of the most innovative and fast-growing private cloud companies. With more than 3,000 customers and ARR that has grown over 250 percent year over year, ClickHouse leads the market in real-time analytics, data warehousing, observability, and AI workloads.

The company’s sustained, accelerating momentum was recently validated by a $400M Series D financing round. Over the past three months, customers including Capital One, Lovable, Decagon, Polymarket, and Airwallex have adopted the platform or expanded existing deployments. These customers join an established base of AI innovators and global brands such as Meta, Cursor, Sony, and Tesla.

We’re on a mission to transform how companies use data. Come be a part of our journey!

About the role

We are committed to providing our customers with reliable and secure services so we are expanding our central Site Reliability Engineering team. You will be responsible for building and leading processes to ensure the reliability, availability, scalability, and performance of our cloud infrastructure. You will collaborate with different teams like Control Plane, Data Plane, Core, Security, Support and Operations and guide them to design and implement scalable, secure, highly available and fault-tolerant distributed systems. You will also own the areas of incident management and response, post-mortem analysis including running blameless postmortems, and continuous improvement of our Cloud services. You will be leveraging your software engineering expertise to develop software platforms and tools to optimize the operational and engineering efficiencies of ClickHouse Cloud. This role is a unique opportunity to make a significant impact on our elastic, limitless scale, high-performance ClickHouse Cloud.

What will you do?

Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse.
Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.
Ensure all the infrastructure components in ClickHouse Cloud (including Data Plane, Control Plane,ClickHouse Core, etc) have monitoring and alerting in place to ensure timely detection and resolution of incidents.
Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers.
Continuously improve the reliability and performance of our ClickHouse services.
Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities.
Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.

About you:

Bachelor’s or Master’s degree in Computer Science or a related field.
At least 8 years of experience in Site Reliability Engineering or a related field.
Hands-on experience with Go and/or Python.
Strong knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.
Excellent understanding of distributed databases and SQL, particularly ClickHouse is a major plus.
Hands-on experience with container orchestration tools such as Kubernetes or Docker Swarm.
Strong experience with automation and configuration management tools such as Ansible, Terraform, or Puppet.
You are a strong problem solver and have solid production debugging skills.
You are passionate about efficiency, availability, scalability, and data governance.
You thrive in a fast paced environment, and see yourself as a partner with the business with the shared goal of moving the business forward.
You have a high level of responsibility, ownership, and accountability.
Excellent communication and interpersonal skills.

#LI-Remote

Compensation

For roles based in the United States, the typical starting salary range for this position is listed above. In certain locations, such as the San Francisco Bay Area and the New York City Metro Area, a premium market range may apply, as listed.

These salary ranges reflect what we reasonably and in good faith believe to be the minimum and maximum pay for this role at the time of posting. The actual compensation may be higher or lower than the amounts listed, and the ranges may be subject to future adjustments.

An individual’s placement within the range will depend on various factors, including (but not limited to) education, qualifications, certifications, experience, skills, location, performance, and the needs of the business or organization.

If you have any questions or comments about compensation as a candidate, please get in touch with us at paytransparency@clickhouse.com.

Perks

Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in over 20 countries.
Healthcare - Employer contributions towards your healthcare.
Equity in the company - Every new team member who joins our company receives stock options.
Time off - Flexible time off in the US, generous entitlement in other countries.
**A $500 Home office setup **if you’re a remote employee.
**Global Gatherings **– We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites.

Culture - We All Shape It

As part of a rapidly scaling start up, you will be instrumental in shaping our culture.

Are you interested in finding out more about our culture? Learn more about our values here. Check out ourblog posts or follow us on LinkedIn to find out more about what’s happening at ClickHouse.

**Equal Opportunity & Privacy **

ClickHouse provides equal employment opportunities to all employees and applicants and prohibits discrimination and harassment of any type based on factors such as race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Please see here for our Privacy Statement.

About ClickHouse

We’re on a mission to transform how companies use data. Come be a part of our journey!

About the role

What will you do?

Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse.
Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.
Ensure all the infrastructure components in ClickHouse Cloud (including Data Plane, Control Plane,ClickHouse Core, etc) have monitoring and alerting in place to ensure timely detection and resolution of incidents.
Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers.
Continuously improve the reliability and performance of our ClickHouse services.
Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities.
Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.

About you:

Bachelor’s or Master’s degree in Computer Science or a related field.
At least 8 years of experience in Site Reliability Engineering or a related field.
Hands-on experience with Go and/or Python.
Strong knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.
Excellent understanding of distributed databases and SQL, particularly ClickHouse is a major plus.
Hands-on experience with container orchestration tools such as Kubernetes or Docker Swarm.
Strong experience with automation and configuration management tools such as Ansible, Terraform, or Puppet.
You are a strong problem solver and have solid production debugging skills.
You are passionate about efficiency, availability, scalability, and data governance.
You thrive in a fast paced environment, and see yourself as a partner with the business with the shared goal of moving the business forward.
You have a high level of responsibility, ownership, and accountability.
Excellent communication and interpersonal skills.

#LI-Remote

Compensation

If you have any questions or comments about compensation as a candidate, please get in touch with us at paytransparency@clickhouse.com.

Perks

Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in over 20 countries.
Healthcare - Employer contributions towards your healthcare.
Equity in the company - Every new team member who joins our company receives stock options.
Time off - Flexible time off in the US, generous entitlement in other countries.
**A $500 Home office setup **if you’re a remote employee.
**Global Gatherings **– We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites.

Culture - We All Shape It

As part of a rapidly scaling start up, you will be instrumental in shaping our culture.

Are you interested in finding out more about our culture? Learn more about our values here. Check out ourblog posts or follow us on LinkedIn to find out more about what’s happening at ClickHouse.

**Equal Opportunity & Privacy **

Please see here for our Privacy Statement.

Senior Site Reliability Engineer- Remote

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

About ClickHouse

Compensation

Perks

About ClickHouse

Compensation

Perks