February 13, 2024

  • 8mins

Product Analytics with Clickhouse and Mitzu

István Mészáros

Product analytics with Clickhouse and Mitzu

In the realm of data analytics, warehouse-native product analytics platforms, such as Mitzu, are revolutionizing how companies leverage their data warehouses, including those powered by Clickhouse. By directly interfacing with Clickhouse, Mitzu eliminates the need for traditional reverse ETL processes, enabling real-time analytics directly on top of existing data infrastructure. This approach ensures that businesses can harness the full power of their data without duplicating it across disparate systems. The integration between Mitzu and Clickhouse offers unprecedented scalability, flexibility, and speed in querying vast datasets, making it easier for organizations to find actionable insights from their product data. 

This blog post delves into how leveraging Mitzu over Clickhouse for product analytics can transform data strategies by providing more profound, more actionable insights with significantly reduced overhead and complexity.

Introduction to Clickhouse

ClickHouse is an open-source, column-oriented database management system (DBMS) designed for online analytical processing (OLAP) tasks. It excels in handling large volumes of data, including petabytes-scale, with impressive speed and efficiency. Originating from Yandex, Russia's largest technology company, ClickHouse has gained popularity worldwide for its performance and scalability. Its columnar storage architecture allows for faster query execution compared to traditional row-oriented databases, especially when processing analytical queries that touch many rows but only a subset of columns. ClickHouse supports real-time data ingestion, making it suitable for applications requiring up-to-date analytics. It offers various compression techniques to reduce storage costs while maintaining fast query speeds. ClickHouse is highly scalable, supporting distributed database configurations to handle increased loads seamlessly. It features a robust SQL query engine, making it accessible to those familiar with SQL. The system is also highly customizable, with support for user-defined functions, allowing for tailored analytics and processing. With its growing ecosystem and community, ClickHouse continues to evolve, adding features and improvements to meet the demands of modern data analytics.

Setting up Clickhouse for warehouse-native product analytics

This post will look at Clickhouse as a classic data warehouse solution. Clickhouse has many great features, like different types of joins that put focus on performance, or many inbuilt functions like windowFunnel.
These features are very relevant for product analytics and make Clickhouse a great candidate a storage medium for product usage events.

Modeling events

As I have discussed in this blog post, there are only three ways to model product events in a data lake or warehouse.
I am not going to cover the benefits and drawbacks here. Instead, I chose the single event table model and showcased warehouse native product analytics with an example.

Single event table Model in Clickhouse

Clickhouse has a 2 level namespace system where your product event tables sit in a under a single Schema.

Event tables in clickhouse

An essential thing to mention is mandatory columns. Each product event table has a user identifier and event time columns. While the name of the table represents the name of the action the user performed in the product. It is essential to have all mandatory columns be named the same, e.g., USER_ID, CLIENT_EVENT_TIME. 

Product analytics over Clickhouse data

Traditionally, there were two ways of learning about your product usage with Clickhouse:

  • Reverse ETL the event tables to a 3rd party product analytics application (like Amplitude or Mixpanel)
  • Good old SQL queries for funnel, retention, and user segmentation.

Both options are mediocre at best. 3rd party solutions are costly as they need to copy all your data. Besides that, their insights often diverge from your SQL-based insights.
Product analytics tools, however, are essential to a data-driven organization as they enable the non-tech folk  (product managers, marketing, sales, customer success) to understand product usage behavior.

There is a new and emerging way for product analytics that is best described as warehouse-native. This approach kind of merges the best of the two worlds:

  • It writes native Clickhouse SQL queries over the data warehouse.
  • It offers an intuitive user interface for the non-tech folk.

On top of this, they are significantly cheaper than 3rd party solutions (Amplitude, Mixpanel).
These solutions achieve cost efficiency by not copying your data. The warehouse-native product analytics solution is a thin user interface over your data warehouse. Last but not least, data security is not compromised.

Mitzu as warehouse-native product analytics

Mitzu is one of the leading warehouse native product analytics solutions.
It enables the aforementioned non-tech folk to learn directly from the data wareshouse the following insights:

  • conversions rates
  • retention rates
  • average time to convert in a funnel
  • user segments comparison
  • revenue metrics
  • and many more.

Besides that, it has excellent charting capabilities:

  • Area
  • Line
  • Bar, Stacked bar
  • Geographical visualizations
  • Sankey charts
Funnel analytics in Mitzu with Clickhouse

Users of Mitzu can interact with the data in Clickhouse as they would interact with data in Amplitude or Mixpanel. Each click shows a new learning about the users of your product.
Creating user cohorts, dashboards and showing user properties (like email or origin country) is always one click away.

Dashboards in Mitzu

As your company doesn't have to set up a reverse ETL pipeline or any other data ingestion mechanism to 3rd party tools, Mitzu can run on a fixed price model.  
It doesn't care about your monthly tracked users or events. Typical warehouse-native applications have either seat-based pricing or fixed pricing models.

Clickhouse

is a robust and capable technology. It optimizes the generated SQL queries sufficiently, so a typical product analytics query runs almost as fast as it would run in Amplitude or Mixpanel over their proprietary storage solution.

Integrating Mitzu with Clickhouse

Integrating Mitzu with Clickhouse for a typical data warehouse that has, for example, 500 event tables takes around 5 minutes.
You only need to create a user in Clickhouse that has permission to access (read-only) the tables in the catalog.
Once the user is ready, you must share the credentials with Mitzu. Remember, you control what permissions you give to your user.


Select the 500 tables containing your product usage events as a second step. Choose the user identification and event time columns and, optionally, the date partition (if you have those).
As the last step, click "Save and Update Catalog". This step should take a minute or two.

After this step, head to the Explore page and start understanding your users.


Summary

In this post, I wanted to showcase how you can benefit from using Clickhouse and a warehouse-native product analytics solution like Mitzu. 

Integrating Mitzu with your Clickhouse data warehouse only takes 5 minutes, and you can control which tables it can reach and which can't. After the setup is ready, you and your non-tech colleagues, like product managers, marketing specialists, and sales representatives, can access the data warehouse in a self-served way without ever learning SQL.

Explore warehouse native product analytics

See how you can benefit from warehouse native product analytics

Blogs for your growth