•

Using Mitzu With Trino For Product Analytics

Explore how to use Mitzu.io with Trino and PrestoSQL for fast, scalable product analytics and empower your team with easy, SQL-free data access.

István Mészáros

March 6, 2024

•

5 min read

Share this post

Clickhouse and Mitzu warehouse-native integration

Overview

Heading 2

Subscribe to our newsletter

Join 1000+ Data and Analytics professionals staying up-to-date with Mitzu's newsletter.

Thank you! You have been subscribed!

Oops! Something went wrong while submitting the form.

Transforming Product Analytics with Trino and Mitzu

In the realm of data analytics, warehouse-native product analytics platforms, such as Mitzu, are revolutionizing how companies leverage their data warehouses, including those powered by Trino. By directly interfacing with Trino, Mitzu eliminates the need for traditional reverse ETL processes, enabling real-time analytics directly on top of existing data infrastructure. This approach ensures that businesses can harness the full power of their data without duplicating it across disparate systems. The integration between Mitzu and Trino offers unprecedented scalability, flexibility, and speed in querying vast datasets, making it easier for organizations to find actionable insights from their product data.

This blog post delves into how leveraging Mitzu over Trino for product analytics can transform data strategies by providing more profound, more actionable insights with significantly reduced overhead and complexity.

What Is Trino? A Modern SQL Engine for Big Data

In the ever-evolving landscape of big data analytics, Trino stands out as a powerful distributed SQL query engine designed for interactive query performance over large-scale data lakes. Originally developed by Facebook and known as PrestoSQL, it was rebranded to Trino to avoid confusion with PrestoDB. Trino facilitates querying data where it lives, including in Hadoop, S3, Azure Blob Storage, Google Cloud Storage, and many relational database systems, without data movement or transformation.

Trino's architecture is built for speed and efficiency, employing a low-latency, high-performance engine that supports a variety of data sources through its connector framework. This makes it an ideal tool for data analysts, scientists, and engineers who require real-time insights across different data stores. Its ability to execute analytical queries against various data sources with ANSI SQL support allows for complex data analysis and aggregation without needing separate data pipelines.

Setting Up Trino for Warehouse-Native Product Analytics

This post will look at Trino as a classic data lake solution. Trino has many great features, like joining between two completely different data sources or functioning as an ELT/ETL solution. However, those features are not relevant for product data analytics.

Modeling Events

As I have discussed in this blog post, there are only three ways to model product events in a data lake or warehouse.
I am not going to cover the benefits and drawbacks here. Instead, I chose the single event table model and showcased warehouse native product analytics with an example.

Modeling Events in Trino: The Single Event Table Approach

Trino has a three-level namespace system where your product event tables sit in a single catalog under a single Schema.

These product event tables may be partitioned on the storage side (e.g., S3).
Ideally, there is only one partition column, DATE. Having multiple date-type partition columns in a table, like YEAR, MONTH, or DAY, is impractical. It makes SQL query development a pain for data analysts.

Another essential thing to mention is mandatory columns. Each product event table has a user identifier and event time columns. While the name of the table represents the name of the action the user performed in the product. It is essential to have all mandatory columns be named the same, e.g., USER_ID, CLIENT_EVENT_TIME.

Traditional vs. Warehouse-Native Product Analytics with Trino

The Limitations of Traditional Reverse ETL and SQL Queries

Traditionally, there were two ways of learning about your product usage with Trino:

Reverse ETL the event tables to a 3rd party product analytics application (like Amplitude or Mixpanel)
Good old SQL queries for funnel, retention, and user segmentation.

Both options are mediocre at best. 3rd party solutions are costly as they need to copy all your data. Besides that, their insights often diverge from your SQL-based insights.
Product analytics tools, however, are essential to a data-driven organization as they enable the non-tech folk (product managers, marketing, sales, customer success) to understand product usage behavior.

The Advantages of Warehouse-Native Product Analytics

There is a new and emerging way for product analytics that is best described as warehouse-native. This approach kind of merges the best of the two worlds:

It writes native Trino SQL queries over the data lake.
It offers an intuitive user interface for the non-tech folk.

On top of this, they are significantly cheaper than 3rd party solutions (Amplitude, Mixpanel).
These solutions achieve cost efficiency by not copying your data. The warehouse-native product analytics solution is a thin user interface over your data lake. Last but not least, data security is not compromised.

How Mitzu.io Enhances Product Analytics on Trino

Key Insights You Can Unlock with Mitzu

Mitzu is one of the leading warehouse native product analytics solutions.
It enables the aforementioned non-tech folk to learn directly from the data lake the following insights:

conversions rates
retention rates
average time to convert in a funnel
user segments comparison
revenue metrics
churn rate
and many more.

Advanced Visualization Features in Mitzu

Besides that, it has excellent charting capabilities:

Area
Line
Bar, Stacked bar
Geographical visualizations
Sankey charts

Users of Mitzu can interact with the data in Trino as they would interact with data in Amplitude or Mixpanel. Each click shows a new learning about the users of your product.
Creating user cohorts, dashboards and showing user properties (like email or origin country) is always one click away.

As your company doesn't have to set up a reverse ETL pipeline or any other data ingestion mechanism to 3rd party tools, Mitzu can run on a fixed price model.
It doesn't care about your monthly tracked users or events. Typical warehouse-native applications have either seat-based pricing or fixed pricing models.

Trino is a robust and capable technology. It optimizes the generated SQL queries sufficiently, so a typical product analytics query runs almost as fast as it would run in Amplitude or Mixpanel over their proprietary storage solution.

Simplifying Integration: Connecting Mitzu to Your Trino Data Lake

Integrating Mitzu with Trino for a typical data lake that has, for example, 500 event tables takes around 5 minutes.
You only need to create a user in Trino that has permission to access (read-only) the tables in the catalog.
Once the user is ready, you must share the credentials with Mitzu. Remember, you control what permissions you give to your user.

‍

Select the 500 tables containing your product usage events as a second step. Choose the user identification and event time columns and, optionally, the date partition (if you have those).
As the last step, click "Save and Update Catalog". This step should take a minute or two.

After this step, head to the Explore page and start understanding your users.

‍

Conclusion: Why Trino and Mitzu Are the Future of Product Analytics

In this post, I wanted to showcase how you can benefit from using Trino and a warehouse-native product analytics solution like Mitzu.

Integrating Mitzu with your Trino data lake only takes 5 minutes, and you can control which tables it can reach and which can't. After the setup is ready, you and your non-tech colleagues, like product managers, marketing specialists, and sales representatives, can access the data lake in a self-served way without ever learning SQL.

Unbeatable solution for all of your analytics needs

Get started with Mitzu for free and power your teams with data!

Book a Demo Start for Free

Blogs for your growth

View all

Data Engineering

Using Mitzu With Trino For Product Analytics

Transforming Product Analytics with Trino and Mitzu

What Is Trino? A Modern SQL Engine for Big Data