Rich Behavioral Analytics using Pentaho and CoolaData

Masses of user generated event-data is collected from this busy portal which offers diverse content such as news, sports, culture, fashion and food, as well as video and music on demand. The volumes of raw event-data collected from their websites and apps are staggering. Understanding their users’ behavior is vital to the publishing business.

As a data management expert, I undertook the challenge to integrate user behavioral analytics into the organization, in order to maximize measurement and analytics capabilities, to provide the most complete understanding of the portal users’ behavior, which will lead to optimization and growth.

A Boost Towards Behavioral Analytics

The Pentaho open source solution was already in place handling the data collection, ETL, integrated with a visualization tool, but that alone was not delivering deep insights of user behavior, which is so crucial for publishers.

CoolaData was introduced as a one-stop-shop for big-data behavioral analysis, capturing event-data at its very inception all the way to advanced behavioral analytics and visualization. When I first heard this, I was skeptical. How can a single solution address this enormous data landscape? After all, there are dozens of products out there, handling various data platform components like, data collection, ETLs, visualization, and data warehousing among others. Typically, companies acquire and struggle with several separate tools to turn raw data into insights.

I was curious to find out how this new breed of analytics could really be a plug-n-play BI solution. CoolaData flawlessly handles event-tracking and is very convenient to manage. The data goes into Google BigQuery and is kept accessible on a row level, with JDBC and CoolaSQL containing the behavioral specific clauses. The native reports and dashboards are designed to deliver behavioral analytics to the various departments in the organization.

The Story of a Beautiful Integration

I was looking for a solution that wouldn’t undermine the initial investment in Pentaho, as I found their custom workflows, consolidations, rankings and dashboards that had been especially designed for the editors particularly useful.

I was able to accomplish this using CoolaData coupled with Pentaho-Data-Integration (PDI), an ETL tool that enables you to manage all of the jobs in your BI environment.

I was able to build an enterprise-grade behavioral analytics solution that uses the power of both products, with these advanced capabilities:

I created a linked BigQuery dataset to the CoolaData BQ project and then ran SQLs on both. This openness and flexibility enables development of complex data processes by writing SQL procedures that pull out and crunch the data from CoolaData and then writes it to the BQ data set, which can then be queried with the SQL client.

I combined data from external sources to the linked BQ dataset, to enrich CoolaData event data with dimensional data. We used it to load the metadata table of customer accounts with their IDs, that helped analyzing behavior by account_id instead of adding additional account-level information.

I then transfer data to another destination by creating a PDI job that queries and exports data from CoolaData. This is needed when data from the analyitcs environment is required in another system. I used such a process to transfer data into a reporting system for the company’s partners. Data from the CoolaData project was aggregated in the BQ dataset and then transferred to a MySQL database.

Querying away

With a comprehensive solution in place, the analysts are able to apply behavioral analytics to really analyze user behavior and actually build full and rich behavioral profiles that are wisely used for optimization and personalization.

Using CoolaSQL, CoolaData’s behavioral query language (like BigQuery-SQL with CoolaData’s behavior-specific query extensions), the analysts easily perform cohort analysis, path analysis and funnel analysis to their heart’s content.

Now, each department has its own custom dashboards that provide real-time reports of relevant user activities. On top of that, analysts combine reports like: How long did this type of consumer watch a certain video, then what did he do? Which articles did this consumer choose to read over which time periods? How do most users access a given shopping site during the afternoon? Which method of access more likely results in an actual sale?

I was impressed with how agile and open CoolaData is and how I could quickly integrate it with already-existing tools and processes to create a seamless solution that could provide a cutting-edge behavioral analytics. The integration with BigQuery is great; I don’t need to worry about the infrastructure and I have 100% flexibility with the analytic environment.

That specific publisher is happy with the new behavioral analysis capabilities and is already optimizing the experience and sees business growth.

Want to learn more about customized behavioral analytics solutions using Pentaho or any other common tools?
Contact us at iknowlogy

Share this post