Sorry Guys It’s Not Just About Big Data
McKinsey christened it, Forbes is writing about it, the Harvard Business Review just devoted 22 pages to it and the tech press is alive with it – “it” is big data of course. Treasure Data grew up with big data, distributed storage and processing architectures and handling thousands of concurrent users – we get big data. We also understand that big data is not, in itself, an answer to anything. That’s why Treasure Data is more than just a Hadoop-in-the-Cloud company, we are focused on how organizations of all sizes can make big data work for their business through analytics. When we designed our service, we looked at 4 key issues that needed to be addressed to make Hadoop-powered Cloud Data Warehousing a reality.
- Easy Data Collection – it’s imperative that data can be collected from any source and that it can be done quickly and easily. To do this, we use td-agent a version of our popular open source Fluentd lightweight data collection daemon to achieve this. td-agent can be run in a parallelized architecture to enable scalable data collection and provide batch or continuous data-feeds into the Treasure Data Cloud Data Warehouse.
- Fast Analytic Processing – Hadoop is great at storing large amounts of big data, but it was designed for batch data processing not analytic processing in a near real-time environment. We therefore took the most proven analytic architecture (column-store) and developed our own analytic database to provide the level of performance needed for iterative, near real-time analysis of big data.
- Flexible Business Intelligence Options – while being a data scientist may be “The Sexiest Job of the 21st Century” 1, there quite simply aren’t enough data scientists to go around. What’s more, not every query or report needs to be written by a data scientist. In fact, most queries and reports can be developed using off-the-shelf BI tools or even a spreadsheet. That’s why we added a SQL-layer to Hive to support JDBC and ODBC connectivity to support hundreds of different BI tools.
- Data Warehousing as a Service – we’re proud of our technical innovation but our real breakthrough is not our technology, it’s that we make it available as a service on the proven and scalable Amazon architecture. This approach enables customers to get live in days not months and without spending tens of thousands of dollars on Hadoop consultants. Yes, we love big data and we love Hadoop but neither are answers in themselves. The answer lies in harnessing analytics to big data and that is what Treasure Data is all about. 1 Harvard Business Review, October 2012