Completing the Data Science Pipeline


In this post, we’ll how show how we’re bringing the power of machine learning to Cooladata, and why we’re so excited to be completing the data science pipeline.

Sometimes, the statistics and model building is the easiest part of data science. It’s often most difficult to get all of the data into the same place, fully aggregated, deduplicated, fresh and in a format conducive to modelling. This is of course assuming you know how to tackle one of several business questions using data science, like predicting customer conversion, churn, or building behavioral clusters. Furthermore, once you’ve built and trained a model, how do you deploy it? We’ll discuss all of these things in this post.

In my previous post, I discussed how we’re bringing R and Python to Cooladata and what that means for data scientists and how to generate custom charts and graphs using matplotlib from Python or ggplot2 from R. Today, I’ll share a way to leverage Cooladata for predictive modelling without even leaving your browser.

In this example, we’ll build and train a logistic regression model that attempts to predict whether customers will request a demo on a website. Let’s get started!

Step 1: Load your Packages

To get started, we’ll build an aggregation table and select “R” for the type. We’ll then be presented with an area in which we can write our code.

In this case, we’ll be using the glm() function within R, which is included in the stats package. The only package we need to add is dplyr for some basic data manipulation.


Step 2: Get your Data

Leveraging Cooladata’s CQL syntax, we’ll select the users and properties from our database that we want to add as features to our model. Keep in mind, that if we had any external profile data from a CRM or transaction database, we could write a JOIN query to incorporate it into the model as well.

Step 3: Build and Train your Model

Here’s where it gets interesting. In the same window, we can start referencing the data we imported as a data.frame directly within our R code. I’ll walk through the specifics of building and training the model in the video below:

Step 4: Export the Results to Cooladata

After we’ve written the code for our model, we’ll set the table name, write mode, and frequency for updating. Keep in mind that you can time-bound your initial CQL query to ensure that it only runs on the most recent data, the last 90 days, for example.


Once you return the data.frame to Cooladata, you’ll be able to access the results the same way you’d access any other table in Cooladata. This includes accessing it via the query API if you intend to build low-latency services on top of these results.


Next Steps

Now that you have a model that’s continuously learning from your users’ behavior, there are several different things you can do with it within Cooladata.

Create Intelligent Segments: By creating a segment in Cooladata, you can select only users who have a certain threshold for predictive criteria to enter that segment. Accessing this information via the query API, you can dynamically tailor his or her experience based on the segment they are in.


Create CQL Reports: Since you now have an aggregation table with predictive values for your users, this information is undoubtedly going to be helpful for your support, sales and account management teams. Want to know which users are highly likely to churn? Create a report!


Export the Data for External Applications: If you have a list of 10% of your prospects that are most likely to convert to paying customers, there are a lot of things you can do with that list. You could upload it to your Google Adwords or Facebook remarketing engine to create a custom audience, for example. Alternatively, you could launch a custom email campaign using HubSpot, MailChimp, or any other system specifically targeting pre-purchase intent.

You can find more information about the query API here:

The possibilities are ultimately endless. This is just a small taste of what is possible. If you’re still stuck building and training models outside of your data repositories, feel free to reach out!

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *