I mainly write this to tame the turmoil that is spinning around in my head for months. Until now I have grown up and lived in the “old” BI world for 16 years, starting my career as a BI consultant using one of the first TM1 spreadsheet connector versions available. Building OLAP cubes and loading data with some Excel macros, I thought, this is analytical heaven.
I remember some meetings with customers in these early days arguing about the needed granularity of the dimensions. There were times I said that OLAP is about aggregates and that it is sufficient to have months as the most detailed level in the time dimension (I was just afraid that the day level would blow the database server – TM1 please forgive me but it was 16 years ago).
The funny thing, the customers are still in business (more than less).
During my journey through the beautiful land of Business Intelligence I met the Data Warehouse (based on RDBM systems) and we became close friends, I personally never met those cool Analytical Databases like Vertica, Greenplum or Kognitio, just to a name a few (what a pity). At the moment I get acquainted with some in-memory column based guy I think this will also lead to a very close relationship. Writing this very sentence, there is one thing for sure: I love data!
During my work as BI consultant and even now with my responsibility as VP Product Marketing (maybe now more than years ago) I’m guided by “the data information knowledge wisdom hierarchy” principle (according to R. L. Ackoff, 1989).
This in combination with the simple definition of the term “Business Intelligence” by Hans Peter Luhn (1958):
“The ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal”
makes me pretty sure, that it’s not just about having the data, it’s about transforming data into information, transforming information into …
Following the definition of Luhn I come to the conclusion that BI is more of a discipline and that Hadoop / Big Data will become another very powerful tool (or a set of tools) in a BI environment. If BI means “create information and knowledge” and Hadoop / Big Data will help me to fulfill this mission I will use is. My personal point of view is:
It is not a “vs.”, it’s a new companion!
But what means Hadoop/Big Data to BI, if it will not replace the “traditional” BI. In my opinion the Hadoop/Big Data buzz heralds something completely new. It’s not growing your EDW from 100s of GBs to 10s of TB to the PetaByte Scale level, it’s “you don’t have to care about the volume of the data and the velocity of the data arrival” because you can use commodity hardware to save all this data and process it. The 3rd V is not forgotten and I will care about the variety of data in another post. In my opinion Hadoop provides the possibility to store data even if I don’t know at the moment for what reason. I believe data is an asset, so “store now, use later”!
I don’t want to repeat the history how Hadoop/Big Data was born and how it struck the BI crowd. There is just one thing that I want to emphasize: Hadoop/Big Data was invented due to the fact, that a company wasn’t able to leverage traditional hardware systems or software to process the data that was generated by “daily business”. Fortunately the company gave it to the Apache community – thanks!
As a data lover there is always one question: How can I use and analyze the data, that I have successfully stored within Hadoop. Nothing is more simple than that, I just have to program a Map&Reduce java script – that’s it. This is not the place to show my simple Map&Reduce attempts to discuss the beauty of the concept. Here I have to remind myself that the typical user (me and I think someone else) in a BI environment does not want to or is not capable of doing this and that even with Big Data analytics the concept of multi dimensional analysis is not that much outdated:
One of the most exciting white papers I read in the last months. This white paper describes how LinkedIn build its own OLAP engine to satisfy the needs of their customers.
By the way,I love the question from the SAS guy.
Another very interesting white paper about Hadoop and multidimensional analysis can be found here:
This white paper describes how Klout (www.klout.com) uses the capabilities of the Microsoft SQL Server Analysis Services engine in its Hadoop / Big Data environment.
I don’t know why, but I love this paper.
So if it’s not a vs., how can Hadoop and its Big Data be integrated in existing BI / DWH environments. This integration can achieved for example by using Hive (http://hive.apache.org/). Hive is a data warehouse system that provides a SQL-like query language: hiveQL. I’m sure that in a future post I will write something more about this but for now I just want to mention my favorite hiveQL function:
CREATE EXTERNAL TABLE …
This statement allows me to use the data that is stored on a Hadoop system without copying it to a hive table.
Writing this post, rereading all the material I gathered in the last months, trying to remember what I read in the great books from Tom White, Eric Sammer, Dean Wampler et al. or James Gleick and all the conversations I had (thanks to my colleagues and friends) leads to the conclusion:
Hadoop can be integrated in the BI landscape and we will integrate it in our landscape in one or another way. Maybe you recognize the sample data in the following screen shot:
P.S.: The new BI world that we are entering at the moment, will only be the new world, as long as the buzz becomes commodity and I think this will be in the near future.