By Dennis Duckworth
I attended the Strata + Hadoop World Conference in San Jose a few weeks ago, which I enjoyed immensely. I found that this conference had a slightly different “feel” than previous Hadoop conferences in terms of how Hadoop was being positioned. Since I am from the data warehouse world, I have been sensitive to Hadoop being promoted as a replacement for the data warehouse.
In previous conferences, sponsors and presenters seemed almost giddy in their prognostication that Hadoop would become the main data storage and analytics platform in the enterprise, taking more and more load from the data warehouse and eventually replacing it completely. This year, there didn’t seem to be much negative talk about data warehouses. Cloudera, for example, clearly showed its Hadoop-based “Enterprise Data Hub” as being complementary to the Enterprise Data Warehouse rather than as a replacement, reiterating the clarification of their positioning and strategy that they made last year. Maybe this was an indication that the Hadoop market was maturing even more, with companies having more Hadoop projects in production and, thus, having more real experience with what Hadoop did well and, as importantly, what it didn’t do well. Perhaps, too, the data warehouse escaped being the villain (or victim) because the “us against them” camp was distracted by the emergence and perceived threat of some other technologies like Spark and Mesos.
The conference was just another data point supporting my hypothesis that Hadoop and other Big Data technologies are complementing existing data warehouses in enterprises rather than replacing them. Another data point (actually a collection of many data points) can be seen in the survey results of The Information Difference Company as reported in the paper “Is the Data Warehouse Dead?”, sponsored by IBM. You can download a copy here.
Reading through this report, I found myself recalling many of the conversations I myself have had with customers and prospects over the last few years. If you have read some of my previous blogs, you will know that IBM is a big believer in the power of Big Data. We have solutions that help enterprises deal with the new challenges they are facing with the increasing size, speed and diversity of data. But we continue to offer and recommend relational database and data warehouse solutions because they are essential for deriving business value from data – they have done that in the past, they continue to do so today.
We believe that they will continue doing so going forward. Structured data doesn’t go away, nor does the need for doing analytics (descriptive, predictive, or prescriptive) on the data. An analytics engine that was created and tuned for structured data will continue to be the best place to do such analytics. Sure, you can do some really neat data exploration and visualizations on all sorts of data in Hadoop, but you still need your daily/weekly/monthly reports and your executive dashboards, all needing to be produced within shrinking time windows, that are all fueled by structured data.
About Dennis Duckworth
Dennis Duckworth, Program Director of Product Marketing for Data Management & Data Warehousing has been in the data game for quite a while, doing everything from Lisp programming in artificial intelligence to managing a sales territory for a database company. He has a passion for helping companies and people get real value out of cool technology. Dennis came to IBM through its acquisition of Netezza, where he was Director of Competitive and Market Intelligence. He holds a degree in Electrical Engineering from Stanford University but has spent most of his life on the East Coast. When not working, Dennis enjoys sailing off his backyard on Buzzards Bay and he is relentless in his pursuit of wine enlightenment.Follow @DennisDuckworth
See also: New Fluid Query for PureData and Hadoop by Wendy Lucas