Data loading into Amazon Redshift simplified: The Podcast, part 1

Data loading into Amazon Redshift simplified: The Podcast, part 1

You can hear the whole podcast at this link.

There are two sides to everything, and in the case of software feature development, always at least two stories to be told:  that of the business person who requires the feature, and the developer who creates and maintains it.

Treasure Data has added support for exporting query results to Amazon Redshift.   Data loading, from any source into Redshift, via Treasure Data, is dead easy.  What’s more, data can come from any source;  it’s schemaless, and result export is as simple as adding your Redshift information and making one click.

In this post, covering the first part of our podcast, we’ll discuss the business case, with Treasure Data Sales Engineer Prakhar Agarwal;  in the second part, we’ll talk implementation specifics with architect and feature developer Sadayuki Furuhashi.



Below is a summary of the questions and answers.

Get Treasure Data blogs, news, use cases, and platform capabilities.

Thank you for subscribing to our blog!

Treasure Data:  What are the main motivations for Amazon Redshift output from Treasure Data?

Prakhar: A lot of people use Redshift, but over time realize that their dataset in Redshift is growing. People prefer to use Redshift as a parallel processing database.  The data has grown so much that the costs skyrocket.

Our idea is to use Treasure Data as an intermediate step;  you implement your raw data on TD; you run queries on TD, and push query results to Redshift.  This process helps significantly reduce the cost of Redshift;  it also helps in a scenario where you are looking at aggregated data in your BI tool on Redshift, and you want to get to a specific detail. You pick one of them in your connected BI tool, and you can easily get all of the details for that one data point from Treasure Data.

TD:  What are the pain points from customers that have motivated the move on our part?

Prakhar:  It’s the ease of use that our system brings in, along with the various options for data export it provides.  Right now, it’s very difficult and tedious to load data into Redshift.  For us, adding Redshift result export enables us to complete the analytics pipeline, and enables customers to simply and easily export data there, with only one step.

Treasure Data will simplify this process: you can keep your raw data and you can significantly reduce your Redshift cost.  It’s a win-win from both sides.

TD:  What would the target audience for Amazon Redshift result output be?

Prakhar:  I would say any gaming company, maybe some ad-tech companies, for example.   Their data gets large over time, along with their scaling/planning costs and Redshift cost. We recently had a call with a company whose Redshift costs skyrocketed to $20k/month!  Our solution comes at the perfect time.

TD:  What kind of people are we talking to right now to grow the momentum around using Treasure Data result export into Amazon Redshift? Who (both individual roles and actual companies) could use Treasure Data as a funnel into Redshift?

Prakhar:  As far as individual roles go, any person who’s building a data infrastructure:  could be a data engineer, VP of Marketing or even VP of engineering.  Just yesterday, I was looking at companies that had both VP of Engineering and VP of Data.  The Big Data guy is responsible for providing a big data platform for the company.  These are the kinds of folks who think in terms of technology and money, both. They don’t want to spend a lot of money on a solution that does only one thing. Treasure Data was less expensive than other options they were considering…and it also fulfills a lot of things they were not expecting at all.

In terms of industries, it’s a lot of people who want to do high-performance queries;  a lot of people who are interested in monitoring or alert-based systems, including IOT.   It’s costly to store all the data in Redshift though, so they might use TD as a raw data store, aggregate it over time, and then periodically push to RS, where it can be consumed by a BI tool.

TD:  Would this be similar to a “lambda-architecture, ” where we are using one store for short-term, low latency data and another for longer term, historical (but slightly higher-latency and slower) data query?

Prakhar: Absolutely.  And what will happen is that over time, these two worlds will collide.  So you’ll have one system which can do both things – and we are moving in that direction now.

If importing data into Amazon Redshift – easily – is something you need to do, follow the link below and request a quick tour of our product!

John Hammink
John Hammink
John Hammink is Chief Evangelist for Treasure Data. An 18-year veteran of the technology and startup scene, he enjoys travel to unusual places, as well as creating digital art and world music.
Related Posts