The world is changing rapidly: The past 20 years have seen extreme change in the way companies are approaching data analysis and with this the investment of infrastructure has also grown dramatically. So if you’re mainly dealing with the business aspects of your organization, you’re aware of the fact that every aspect of business is now open to data collection and often even instrumented for data collection: operations, manufacturing, supply-chain management, customer behavior, marketing campaign performance, workflow procedures, and so on.
According to Gartner, by 2020, there will be a 250,000,000 connected vehicles on the road, enabling new in-vehicle services and automated driving capabilities. In addition there will be 25 billion connected things in use by then. Cisco thinks the figure will be closer to 50 billion devices while Morgan Stanley feels that number can actually be as high as 75 billion, and also claims that there are some 200 unique consumer devices or equipment that could be connected to the Internet.
But how will business people be able to deal with this massive explosion of smart, connected “thing” if they don’t at least understand the basics?
Data Analytic Thinking – Think Big
With so much data being collected and bought, companies are focused on creating competitive advantages by the exploitation of this data. When faced with a business problem, you should be able to assess whether and how data can improve performance. Data Scientists need to be part statistician, part hacker, part engineer, part data analyst, part business consultant, part artist, part story teller. The data scientist to be able to talk with multiple parts of the business, gather all the data, connect the dots and look for and spot the most relevant insights and then translate them to actionable suggestions. Understanding the fundamental concepts, and having frameworks for organizing data-analytic thinking will help to envision opportunities for improving data-driven decision-making, or to see data-oriented competitive threats.
For example, companies can use the insights they gather to improve customer engagement and retention strategies or to create new products and services.
Business Problems solved by Data Science
Ultimately, data science matters because it enables companies to operate and strategize more intelligently. It is all about adding substantial enterprise value by learning from data. One very important aspect in data science is predictive analytics. With the help of predictive analytics you can calculate churn – who will stay or who will go? LTV – what will my September revenue be? And NBO (next best offer) – promoting merchandise a user is most likely to purchase. This leads to another very important issue data scientists deal with – data mining, or in other words, making sense out of the data and recognizing behavioral patterns.
Data mining is an analytical process designed to explore data, large amounts of data. Data mining is especially important for business managers because the data mined is usually marketing/business data. Data mining is also mainly used to analyze user behavior by searching for patterns and/or systematic relationships between variables, and then validating the findings by applying the detected patterns to new subsets of data, the ultimate goal here is prediction. Generally speaking, people who currently behave in the same way as other people did in the past, will perform the same future actions as the original group performed in the past. Taking shopping cart abandonment as an example: say your average abandonment rate has been 60%, but in the past people who were associated with three specific variables only had a 40% abandonment rate. We can assume that other people who can today be associated with those three variables will probably show the same 40% abandonment rate. These variables could be demographic, like gender and age, or behavioral, like purchasing specific items or clicking on certain links.
Over the past few years, data warehousing capabilities have tremendously evolved to meeting enterprise standards, addressing different cases such as velocity, variety and volume.
Querying – A data warehouse needs to be capable of dealing with repetitive queries. Repetitive queries support dashboards and reporting requirements when addressing a large amount of visitors.
Scale – This applies to multiple data structures and formats. You need a data warehouse that can deal with large amounts of data in order to address the management of query workloads and query optimization.
Real-time loading – In today’s world bulk and batch loading remain the most common method. More advanced data warehouse technologies are moving to continuous loading methods, which means that data is being loading from operational sources in real-time. This enables you to ingest stream data and perform updates for read optimization.
Machine learning is the science of getting computers to act without being explicitly programmed. Machine learning has given us self-driving cars, practical speech recognition and effective web searches. The process of machine learning is similar to that of data mining. Both systems search through data to look for patterns. However, instead of extracting data for data scientists, machine learning uses that data for the computers own use. Machine learning programs detect patterns in data and adjust program actions accordingly.
The market is no longer what it used to be. Almost every industry is being affected by the sheer volume and ubiquity of Big Data – and no business is immune. This lack of knowledge, from the business manager’s side, in the data science field is much more damaging because the data science is supporting bottom line decision making.
Firms where the business people do not understand what the data scientists are doing are at a substantial disadvantage, because they waste time and effort or, worse, because they ultimately make wrong decisions.