There is a clever article in this week’s Information Week entitled Big Data Debate: End Near For Data Warehousing? The article asks a series of questions that can be reduced to “Will Hadoop supplant the enterprise data warehouse and relegate relational databases to data-mart roles?” This is certainly an interesting question. The article then sets up a face-off between the CEO of Platfora and the President of Teradata arguing for their preferred option.
Of course, despite some insights, the article provides just enough substance to justify the attention-grabbing headline, but doesn’t provide any answers. In fact, it ends with a poll asking readers “which group [of people] is the primary user of your organization’s data?” an issue that wasn’t even raised in the article.
At Treasure Data, we agree that big data vs. data warehousing is a confusing issue for many people. However, the distinction between big data and data warehousing has become so blurred that, for most people we talk to, the terms are interchangeable. Sure there are nuances around each term, but fundamentally, both describe the process of collecting, storing and analyzing large amounts of data.
Here’s what we see in the market:
- Start-ups – typically those companies that are web-based use the term “big data analytics” in exactly the same way that an enterprise company would use the term “data warehouse”. In our experience, these companies naturally look to the cloud for solutions so a cloud-based system like Amazon EMR or Treasure Data makes total sense to them.
- SMB – these companies are more focused on business issues than technology issues. Typically they are reliant on BI solutions that don’t scale well and therefore they are open to the idea of a quick and easy to implement Cloud solution rather than an expensive server or appliance data warehouse solution.
- Enterprise – many enterprises have substantial investment in data warehouse, data mart, BI and ETL solutions but very few are satisfied with the adoption and usage of these systems. Moreover, line of business users are very open to the idea of a cloud-based solution if it can co-exist with existing solutions, is quick to implement and lessens their dependency on IT or 3rd party consultants to provide meaningful analytics.
In our experience, what you call the solution is much less important than what it does and how quickly it does it. Maybe the debate shouldn’t be Hadoop vs. Teradata. Maybe it should be big data as a service vs. either Hadoop or Teradata on premise?