April 23, 2024

  • 10 minutes

Product Analytics with BigQuery and Mitzu

István Mészáros

Product analytics with BigQuery and Mitzu


In the field of data analytics, warehouse-native platforms like Mitzu are transforming the way companies utilize their data warehouses, such as those running on BigQuery.

By integrating directly with BigQuery, Mitzu removes the necessity for traditional reverse ETL processes, facilitating real-time analytics on existing data infrastructures. This method allows businesses to fully utilize their data without the need to replicate it in various systems.

Conneting Mitzu and BigQuery provides simplifies the process for organizations to derive actionable insights from their product usage data. 

This blog post dives into how leveraging Mitzu over BigQuery for product analytics can transform data strategies by providing actionable insights with significantly reduced overhead and complexity.

Ten reasons to choose BigQuery

Here is the top 10 points why any B2C or B2B product development company should choose BigQuery for their data centralisation efforts.

  • Serverless and Fast: BigQuery allows for super-fast SQL queries without the need to manage any server infrastructure.
  • Highly Scalable: Designed to handle petabytes of data effortlessly, it provides real-time data analysis capabilities.
  • Low Maintenance: As a serverless system, it reduces the complexities of setup and ongoing server management.
  • Integration with Google Cloud: BigQuery works seamlessly with other Google Cloud services, enhancing overall data analytics and machine learning capabilities.
  • Standard SQL Support: It supports standard SQL queries, making it easily accessible to those familiar with SQL.
  • Cost-Effective: Offers a pay-as-you-go model, charging only for the data processed, which helps in managing costs effectively.
  • Advanced Features: Includes built-in machine learning and geospatial analysis tools for more sophisticated data handling within the warehouse.
  • Robust Security: Ensures top-notch security with robust data encryption, strict access controls, and compliance with various standards to protect sensitive information.
  • Ideal for Frequent Analysis: Perfect for businesses that require frequent and quick analysis of large datasets to drive quick decision-making.
  • Industry Impact: BigQuery is a leading tool in the data analytics field, valued across various industries for driving insights and strategic outcomes.

Out of these ten I would highlight the Cost-Effective point. This is especially important for startups, who are usually on a tight budget.

Setting up BigQuery for warehouse-native product analytics

Among the numerous impressive features of BigQuery SQL, I would like to draw your attention to 3 that are not only crucial for product analytics with large datasets, but also uniquely advantageous.

  • Partitioned Tables: BigQuery allows partitioning of tables by various criteria, including date, which can significantly improve query performance and cost by scanning only relevant partitions.
  • Clustered Tables: By clustering data based on specific columns, BigQuery can provide more efficient data retrieval, optimizing cost and performance.
  • Materialized Views: These are pre-computed views that periodically cache a query's results for faster access, improving performance for frequent and computationally expensive queries.

These 3 features enable real-time data analytics on petabyte-scale datasets. Although present in other data warehouses or data lake solutions, the simplicity of enabling these features in BigQuery makes it stand out.

Modeling events

As I have discussed in this blog post, there are three primary ways to model product events in a data lake or data warehouse.
I am not going to cover the benefits and drawbacks here. Instead, I chose the single event table model and showcased warehouse native product analytics with an example.

If you use customer data platform solutions like Segment, RudderStack or Jitsu you maybe familiar with the data model. Every event that you track on the client or backend side of your product will be collected into a single table. Each column in these tables represent the properties of the events. This is a simple and great model how to track analytics events in a data warehouse.

Single event table Model in BigQuery

BigQuery has a 3 level namespace system where your product event tables most likely sit in a under a single BigQuery Project and Dataset.

BigQuery event table example

An essential thing to mention is mandatory columns. Each product event table has a user identifier and event time columns. While the name of the table represents the name of the action the user performed in the product. It is essential to have all mandatory columns be named the same, e.g., USER_ID, CLIENT_EVENT_TIME. 

Product analytics over data in BigQuery

Traditionally, there were two ways of learning about your product usage with BigQuery:

  • Reverse ETL the event tables to a 3rd party product analytics application (like Amplitude or Mixpanel)
  • Good old SQL queries for funnel, retention, and user segmentation.

Both options are mediocre at best. 3rd party solutions are costly as they need to copy all your data. Besides that, their insights often diverge from your SQL-based insights.
Product analytics tools, however, are essential to a data-driven organization as they enable the non-tech folk  (product managers, marketing, sales, customer success) to understand product usage behavior.

There is a new and emerging way for product analytics that is best described as warehouse-native.
This approach kind of merges the best of the two worlds:

  • It writes native BigQuery SQL queries over the tables in your dataset.
  • It offers an intuitive user interface for the non-tech colleagues.

On top of this, they are significantly cheaper than 3rd party solutions (Amplitude, Mixpanel, Heap, etc).
These solutions are cost-effective because they avoid duplicating your data. The warehouse-native product analytics solution acts simply as a thin user interface layered over your data warehouse. Furthermore, these solutions ensure that data security remains intact.

Mitzu as warehouse-native product analytics

Mitzu is one of the leading warehouse native product analytics solutions.
It enables the aforementioned non-tech folk to learn directly from the data wareshouse the following insights:

  • conversions rates
  • retention rates
  • average time to convert in a funnel
  • user segments comparison
  • revenue metrics
  • and many more.

Besides that, it has excellent charting capabilities:

  • Area
  • Line
  • Bar, Stacked bar
  • Geographical visualizations
  • Sankey charts
Funnel analytics in Mitzu with BigQuery

Users of Mitzu can interact with the data in BigQuery as they would interact with data in Amplitude or Mixpanel.
Each click shows a new learning about the users of your product.
Creating user cohorts, dashboards and showing user properties (like email or origin country) is always one click away.

Dashboards in Mitzu

As your company doesn't have to set up a reverse ETL pipeline or any other data ingestion mechanism to 3rd party tools, Mitzu can run on a fixed price model.  
It doesn't care about your monthly tracked users or events. Typical warehouse-native applications have either seat-based pricing or fixed pricing models.

BigQuery is a robust and capable technology. It optimizes the generated SQL queries sufficiently, so a typical product analytics query runs almost as fast as it would run in Amplitude or Mixpanel over their proprietary storage solution.

Integrating Mitzu with BigQuery

Integrating Mitzu with BigQuery for a typical data warehouse that has, for example, 500 event tables takes around 5 minutes.
You only need to create a user in BigQuery that has permission to access (read-only) the tables in the catalog.
Once the user is ready, you must share the credentials with Mitzu. Remember, you control what permissions you give to your user.

BiqQuery connection dialog


Select the 500 tables containing your product usage events as a second step. Choose the user identification and event time columns and, optionally, the date partition (if you have those).
As the last step, click "Save and Update Catalog". This step should take a minute or two.

After this step, head to the Explore page and start understanding your users.


Summary

In this post, I wanted to showcase how you can benefit from using BigQuery and a warehouse-native product analytics solution like Mitzu. 

Integrating Mitzu with your BigQuery data warehouse only takes 5 minutes, and you can control which tables it can reach and which can't. After the setup is ready, you and your non-tech colleagues, like product managers, marketing specialists, and sales representatives, can access the data warehouse in a self-served way without ever learning SQL.

Here you can find out more about the features Mitzu offers for any B2C or B2B product development company.

Explore warehouse native product analytics

See how you can benefit from warehouse native product analytics

Blogs for your growth