Big Data has become one of those terms you can’t avoid these days, all the rage in and around the business intelligence industry. The overwhelming buzz makes you wonder - is there something really there, or, is the hype the result of vendors and consultants trying to make another buck?
What Is Big Data?
The 12 year old definition of Big Data is summarized in three words, a definition which is still prevalent today - the three V’s, namely volume, velocity, and variety. Big Data is now attributed to a set of paradigms and technologies which serve organizations in managing data in great volumes, that is collected and needs to be processed in great velocity and is fed from a great variety of sources – both structured and unstructured. Many expand the definition to include other concepts – some for building a more solid ground for the industry, others for making their solutions “More Big Data.” Yet, the bare bones definition for Big Data is still around the three V words.
While Big Data is a paradigm, and many could say that they have been addressing Big Data challenges for over a decade anyway (using already existing technologies), today, Big Data is used in combination with the more recent technologies supporting this paradigm, such as Hadoop, NoSQL, MapReduce, to name a few. In this article, we discuss the hype around these technologies, rather than the paradigm itself, as it is a known fact that the data is becoming bigger, faster and more diverse than ever.
What’s Wrong With Big Data?
Nothing and everything, depending on where you stand (and what you would like to achieve with it). There exist many good reasons for investing in Big Data, and many terrible ones – vendors pushing it as a must-have, your employees wanting it on their resumes, as well as simple ignorance and me-too attitude, to name a few experienced and observed reasons.
Companies have been dealing with big-sized data for a long time, and managing it just fine to a certain extent. Unless business needs have significantly changes, there are disgruntled business users, or the depth and variety of data has proliferated, investing in such a relatively complex and yet-to-be-proven platform can do more harm than good. Considering the fact that even Gartner puts Big Data between “Peak of Inflated Expectations” and “Trough of Disillusionment,” one might consider waiting for the technology to mature before going all in.
Here are a couple of remarks you may want to consider before going ahead with a sizable investment in Big Data:
According to Forrester, despite millions of dollars of investments, in almost two thirds of organizations, Business Intelligence user adoption stands at 10%. It is imperative that such organizations address this challenge or make sure that their Big Data investments will pick up the interest of end-users in order to at least minimize their risk of investing in yet another platform no one benefits from.
Many companies are sold on the idea that Big Data will provide them with the ability to tackle unstructured data, such as Facebook and Twitter, call center customer voice records, customer emails, etc. Unless the organization looking to tap into Big Data is receiving thousands of tweets or complaints a day, using Big Data to analyze such information is akin to using a patriot missile to kill a mosquito. The return on investment would be almost certainly negative, with better results received just by employing people to manually “structure” the data.
Another common selling point for Big Data is the ability to handle machine generated logs and web logs, which is rather curious, because as far as such machines go, nothing they produce is what can be called in any way “unstructured.” We’re yet to come across a machine which decides to change the format of log files it produces every day. It is true that log files have a more complex nature than a tab-separated file or a database table, but the BI universe has been processing them for decades now, and what Big Data brings in is mainly some convenience (at the expense of making many other things much harder).
Last, but not least, many buy into the idea of enabling real-time decisions and actions using Big Data. While it is true that there exists a select number of companies that have organizational and systematical capabilities mature enough to capitalize on real-time analytics, for some others, it is wasteful use of resources which could be utilized for more immediate needs. In most use cases, having a t-1 data warehouse / data marts and using data mining models that run at the end of day (instead of real-time) is good enough to achieve the desired results, at a fraction of cost and complexity. Real-time analytics is a great promise, but requires numerous other enablers (such as a campaign management system, a nimble organization, customized operational systems, etc.).
So It’s All Just Hype?
No. There are a number of use cases where Big Data technologies make perfect sense and could result in a substantial return on investment, such as:
Unknown or Frequently Changing Requirements: One of the greatest aspects of Big Data is the ability to preserve data sources as a whole, instead of structuring them into well-defined set of tables and columns. This provides for a high degree of flexibility, which no traditional data warehouse solution can match. Often, in a traditional data warehouse scenario, when a business user decides that he needs additional information that is available in the operational systems and not on his reports, he ends up waiting for months to get it, since it is not loaded to the data warehouse and the whole ETL process and table structure needs to be updated. Big Data solutions overcome this challenge by simply dumping all data from source systems and defining its structure whenever needed.
Immense Scalability at Lower Cost: One of the greatest promises of Big Data is scalability at lower cost, with technologies such as Hadoop parallelizing immense data storage and querying on relatively low-cost and less-specialized hardware. On the down side, such technologies usually drive up operational and administrational costs, resulting in an even higher total cost of ownership than traditional solutions. Still, it is worth doing a feasibility assessment for organizations dealing with immense data sets, and could result in substantial cost savings against traditional data warehousing setups.
“Unknown” Structured Data: Although relatively infrequent, structures of data sources are sometimes unknown during development and need to be discovered when needed. This is especially useful when the business needs quick results and insights and the BI teams have no time for spending months on source data discovery. With their “dump and ask questions later” approach, Big Data technologies can be of great use in these scenarios.
Readiness and Need for Real-Time Analytics: One of the most common use cases of Big Data technologies nowadays is real-time reporting and marketing. With relatively better abilities to process data fast, Big Data solutions enable some organizations (especially those that already have the abilities to make use of real-time insights) to decide and act faster.
What To Do With Big Data?
Before throwing a technology solution to a non-existent business problem, we urge organizations to perform a Big Data Needs Assessment, determining whether what they have in hand already can address their needs, and ultimately, whether the investment would justify the potential returns.