Why Do You Need An Agile Query Language

Advanced behavioral analytics platforms are able to perform complex queries in order to gather critical data for analysis. An important function that is required for the operation of these, is the retrieval of information from complex databases and other information sources. In general, this process is accomplished with the use of a query language for the specification of procedures for information retrieval and modification. Unlike a standard database query language, an information retrieval query language searches for documents containing relevant information by way of an interpretation of the best results of the query.

Information retrieval query languages rely on the implementation of the following:

  • Weighting and relevancy ranking
  • Relevance oriented results
  • Semantic relativism of objects
  • Logic-based probability

Agile Query Language

The Agile approach to the development of a query language requires breaking down the query language into its various components. They are then used to build upon one another until the final result is achieved.

The components of an agile query language are:

  • Grammar
  • Lexer
  • Parser
  • Tree object


The grammar of a language, much like the spoken language, is the set of rules that defines the language. In order to establish the grammar of a language you must understand the purpose and function of the language.

There are many forms of gererative grammars including context-free grammars, regular grammars, tree-adjoining grammars, affix grammars, and attribute grammars.

A grammar is a set of rules for transforming strings. Apply the grammar production rules to generate all possible strings for the grammar. The language formed by the grammar consists of all distinct strings that can be generated by applying the production rules.

The following is an example:

Consider the grammar G where N = {S,B},  = {a,b,c}, S is the start symbol, and P consists of the following production rules:

  1. SaBSc
  2. Sabc
  3. BaaB
  4. Bbbb

This grammar defines the language L(G) = {anbncn|n ≥ 1 where an denotes a string of n consecutive a‘s. Thus, the language is the set of strings that consist of 1 or more a‘s, followed by the same number of b‘s, followed by the same number of c‘s.


Lexical analysis, or lexing, is the process of converting a sequence of characters into meaningful tokens. For a language like SQL the tokens may be ‘number’, ‘SELECT’, ‘whitespace’, etc. A function that performs lexical analysis is called a lexer. A lexer is usually combined with a parser to analyze the syntax of languages.

Lexing is divided into scanning and evaluating. Scanning breaks the input sequence into groups and categorizes them into token classes. Evaluating converts the raw input characters into a processed value. Some examples of token classes are:

Token Class Example
Identifier sum
Assignment Operator =
Integer Literal 2
Addition Operator +
End of Statement ;

Lexers are commonly generated by a lexer generator which enables development to happen quickly.

For example, here at CoolaData we use ANTLR to generate lexical analyzers and parsers. ANTLR, or Another Tool For Language Recognition, accepts a grammar for a language as input, and generates lexers, parsers, tree parsers, and combined lexer-parsers. A tree parser is a recognizer that simplifies the processing of abstract syntax trees.


A parser takes the tokens from the lexer and orders them properly to build a parse tree, abstract syntax tree, or other hierarchical structure. The parser determines if and how the input can be derived from the start symbol of the grammar. This can be accomplished using top-down or bottom-up parsing methods.

Top-down parsing looks for left-most derivations of the input, and tokens are used from left to right.

Bottom-up parsing attempts to locate the most basic elements, then the elements containing them, and so on until reaching the start symbol.

As with the lexer, CoolaData uses ANTLR to generate lexical analyzers and parsers.

Tree Object

The tree object, or parse tree, is the resultant output that is generated by the parser, and is an ordered, rooted tree that represents the syntactic structure of a string based on the grammar.


Agile query languages help you create queries much faster.  Let’s take the example of a cohort, writing a cohort in SQL would look something like this:


By using an agile query language you’d sum it up to this:


CoolaData uses SQL to create complex queries using powerful aggregate functions which operate on multiple values and rows, and return a single summarizing value. Through the use of well-planned and implemented queries, based on the foundation of the agile method of query language creation, CoolaData leverages its expertise and experience to provide our customers our unique behavioral analytics platform.

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *