Run R & Python Scripts Directly on Data Stored in Cooladata


There aren’t very many things that get me as excited as launching new features in Cooldata. But this announcement is particularly exciting for me to make, especially since I’m the kind of guy that fumbles through SQL queries well enough to get the table that I’m interested in, sample it, export it, run it through an R script to do some exploration and finally build some kind of model.

Well friends, the old paradigm with Cooladata and so many other cloud-based data stores comes to an end today. With today’s release, we are launching the ability to run R and Python scripts directly on the data you have stored in Cooladata!

That’s right, you can now use R and Python on your web analytics data joined with the rest of your organization’s data. Not only do you have an entire data management suite in the cloud with Cooladata, but you now have a set of productionized machine learning and AI tools ready to deploy whatever algorithms you can cook up at any interval you specify. Soon, we’ll be launching an automated processed to predict the risk of churn, automatically cluster users by their behavior, and predict conversions.

Let’s run through an example of an exploratory analysis. Suppose you want to see if session duration is impacted by device type (mobile, tablet, or desktop). Since I’ll be using R in this example, we’ll start by building exploring and exploratory graph using ggplot2.

Step 1: Add your Packages

The first step is always defining which packages you want to use. In this case, I’m going to be using dplyr and ggplot2.

While we have several popular packages installed natively, feel free to ask your customer success representative for any additional libraries you may need.


Step 2: Get your Data

To start, you’ll want to query Cooladata to get the data you’re interested in using to build your model. As with other queries in Cooladata, you can leverage Cooladata’s CQL syntax to infer contextual variables from your environment, such as the date range and any filter you might have. Remember that if you’re more comfortable using ANSI SQL syntax, you’re more than welcome to do so.

Keep in mind that you’re not limited to data inside of Cooldata’s event-based data store, you can also join data from your CRM, your external databases and much more, so long as they are linked to Cooladata.

The results of this query will be saved as a data.frame in R.


Step 3: Build your Graphs

Now, here comes the fun part. Right within Cooladata’s web-based console, you can write your R code just like you’d write it within your IDE.

To start with this exploratory analysis, I’m going to look at the averages across device type to see what kind of differences there may be. To do this, I’ll run the following R code leveraging the dplyr and ggplot2 syntax.


Now, when I hit “Run” Cooladata will load the packages that I specified in Step 1, run the query I specified in Step 2 saving the result as a data.frame and execute the R code that I wrote in Step 3.

And voila! We have our ggplot right inside of our web browser without having to export our data to another IDE.


From here on out, we can continue iterating until we have the plot we want. While the previous chart does a good job at showing a measure of center, we have no idea how much these data deviate from the mean. Let’s use a box plot to paint a clearer picture.

Using this code, we can generate a box plot and leverage the scales package to make our plot a bit easier to read. As you can see, by using R, you can open your data up to exploration and visualization using the entire ggplot library of charts and graphs, affording you much more than what is available using standard SQL.




Step 4: Add your Reports to the Dashboard

Finally, once you’ve used R or Python to create your visualizations, you can add your reports to your dashboards. With the flexibility associated with R and Python and Cooladata’s continuously optimized queries, you can easily put advanced, behavioral insights that draw from multiple data sources in front of your users.


In the next post, I’ll cover how to use this new functionality to generate tables within Cooladata. Using R and Python, these tables can be smart, like predictive smart. I’ll show you how to build a machine learning algorithm from beginning to end that predicts the likelihood of a website visitor requesting a demo based on several data points. Stay tuned!

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *