SQL has ruled the database world for years as a widespread, standard query language that represents data in a common, relational table. However, this isn’t always appropriate for modern applications that generate a great deal of data and utilize certain back-end and Big Data storage models. For example, querying user event data contains various types of messages that may be difficult and sometimes impossible to index or store in a relational table. In spite of the shift to Big Data, analysts still want to leverage know-how, skills and years of experience with relational database engines in order to generate insights. In this post, we share how CoolaData represents log and event-based Big Data in relational, schema-like structures.
CoolaSQL: Running SQL Queries on Big Data
A comprehensive Big Data querying capability does exist. We call it CoolaSQL, or CQL. CQL’s behavioral extensions allow users to use SQL syntax to jump-start user behavior analysis, including user paths, cohorts, and funnels, while CoolaData performs non-relational database operations internally. For instance, as shown below, analysts can execute user path queries, find patterns in application usage and group by similar user patterns behind the scenes. This way analysts receive query result sets in an organized, relational manner:
Path Query Sample:
SELECT path, count(session_id) as sessions_count, avg(path_count()) as avg_path_steps
WHERE date_range(last 120 days)
CLUSTER PATH BY session
START WITH “login” FOLLOWED BY “add_to_cart” 3 times NOT FOLLOWED BY “buy”
HAVING PATH path_count() > 10
GROUP BY path
ORDER BY sessions_count DESC
Much More Than SQL: The “CLUSTER BY” Statement
CoolaData supports ANSI standard SQL queries (e.g. select, from, where, group by), joins and some language constructs of our own. For example, analysts may specifically want to group users according to the time it takes them to move from one event to another, depending on how they surf through a website or a mobile app funnel. By using CoolaData’s funnel analysis, analysts identify problematic steps in user flow, understand where users are churning and so on. Furthermore, CoolaData simplifies the grouping of users according to conversion funnels, since a simple CQL query can generate a great deal of clarity regarding user behavior. We’ll get into specifics below.
How CQL Works: The Query Shipment Mechanism and Smart Cache
We made sure not to burden analysts with internal data structuring or query performance when we developed the CQL framework. Therefore, event and log data has been organized on three different levels:
- Event data includes raw data from user activities.
- Session data is an aggregation of the raw event data that is related to its corresponding user session.
- User data is associated with specific user features.
- “Behavioral aggregations” are our internal representation of time series data – a NoSQL approach.
Let’s take an online bookstore for example. The event data might be the names of the books a specific user views. The session data relates to the names of the books that are viewed during a specific session. Finally, user data consists of book preferences.
These levels of data are utilized by CoolaData’s query shipment mechanism to analyze queries and decide on a level for optimal execution. For instance, a query may be fulfilled by simply looking at specific user data. This approach optimizes the querying process by translating and running queries with a separate back-end that is appropriate for each level. For example, when running a path analysis, we ship its relevant data to the most appropriate back-end level.
In addition to all that, we perform smart caching of some of the data in order not to scan all of the data each time.
Bottom line: we parse and analyze each query so that you don’t have to worry about it.
CoolaData Query API
As a managed data warehouse, it is very important for us to give customers the flexibility needed to execute any statement on their raw data so that they can generate valuable information and insights. CoolaData’s query API provides a powerful way to execute query languages and customers are able to perform tasks that were never before thought possible. CQL allows users to generate automated tasks that include running queries and retrieving results.
CoolaData supports IT system data analysis. With CQL, our customers perform comprehensive “SQL style” queries such as joins. Customers want to see how users behave per traffic source channel, along with the budget that was allocated to that given channel. This information is stored outside of their CoolaData database, in places like their CRM or Google Ads account.
Eventually, we all want to get insights from our data by utilizing it in a correct, fast and simple manner. This is why CoolaData created CQL. We have been able to take this innovative approach to structuring and storing data and align it with other systems within the environment, which results in great data insights, across the board.
Learn more about CQL or sign up for a demo to see CoolaData in action.