Evaluating Analytics SaaS’s Raw Data Access Capabilities
Why Raw Data Access for Analytics SaaS?
If you are a data analyst or data-driven product manager, you must have hit the limits of your analytics tools at one point. For example:
- You noticed that a particular customer’s product usage spiked, and you wanted to track them event by event but your tool doesn’t let you.
- You wanted to correlate feature usage with account contract value (ACV) to see the monetary value of a particular feature, but you couldn’t because data is siloed into Salesforce and Mixpanel.
- You realized that you set properties improperly on your analytics tool, and you had no hope of correcting your mistake without getting the raw data and replaying the past.
The key to solving these problems is export APIs. Using export APIs, you can access raw data and make it available on your own systems such as SQL databases and Amazon S3.
In other words, choosing analytics tools with good export APIs will save you much frustration and anger in the future.
Evaluating Export APIs: SQL-Support, Latency, Accessibility and Pricing (SLAP)
So, which analytic SaaS should you choose as a growth-phase company? Because I was curious about this myself as Head of Marketing, I set out to evaluate how different analytics SaaS enable their customers to export raw data. For me, the evaluation came down to four criteria: SQL-Support, Latency, Accessibility and Pricing (SLAP)
- SQL Support: Does the vendor support SQL access to data? SQL access is valuable for more advanced analytics that dashboards and reports can’t answer. It’s the 20% of the proverbial 80-20 in analytics needs. To satisfy this criterion, the vendor must provide a managed SQL interface and/or make it possible without heavy engineering work.
- Latency: What’s the freshest data you are guaranteed to get? The keyword here is “guaranteed”: In order to automate the process of extracting raw data ( you eventually want to automate), you need to the exact latency of your feed so that you won’t duplicate or drop data.
- Accessibility: How accessible is the export API? Does it support your programming language? Does it require programming at all? How good is the documentation? Admittedly, this criterion is the most subjective of all, but it’s so critical for the productivity of engineering and data teams that I decided to include it.
- Pricing: How much does it cost to gain access to raw data? You would think that your vendor is going to give you access for free, but as you will see, this is not always the case.
I evaluated the following analytics SaaS vendors: Google Analytics, Adobe Analytics (SiteCatalyst), Mixpanel, KISSMetrics, Amplitude and Segment. I will go deeper into each in the following section, but here is the summary with letter grades A through D.
|Service Name||SQL Support||Latency||Accessibility||Pricing|
Notably, we omitted Flurry and Localytics as both of them seemed to require contacting support to even learn about their export APIs in detail.
- SQL Support: Google Analytics Premium customers have direct access to Google Analytics’s raw data inside Google BigQuery, Google’s cloud data warehouse competitive with Amazon Redshift (which, as you will see, is used by other vendors to power their SQL interfaces). A.
- Latency: While I could not find the exact documentation, user interviews revealed that data feeds come in once a day for the previous day. As you will see, this is typical of raw data export APIs. C.
- Accessibility: The BigQuery integration is nice, but the moment you try to get the data out of Google’s ecosystem, there’s very little help from Google. That said, Google Analytics has a mature third-party ecosystem that help you hack around this. This blog is one example. C.
- Pricing: In order to access raw data, you must be a Google Analytics Premium customer, paying $150,000 per year. D.
Adobe Analytics (SiteCatalyst)
- SQL Support: As far as I know, there is no SQL access for Adobe Analytics. D.
- Latency: While I could not find the exact documentation, multiple users have said that data export takes place daily for the previous day’s data. As late as 2012, it took up to 72 hours to get access to raw data via their Data Warehouse API. C.
- Accessibility: Adobe supports exporting data to FTP servers and Amazon S3 periodically but not without limitations. A real-time data feed firehose is also available, probably at an additional cost. The ecosystem around Adobe Analytics is very closed and specialized. Furthermore, judging from their presence on GitHub, the developer ecosystem around it is lackluster. C.
- Pricing: Adobe Analytics is not cheap, starting at $50,000/year and goes up quickly for additional features. D.
- Latency: Their mobile data can be delayed up to 5 days per their documentation. This is shocking, especially because Mixpanel’s well known for being real-time on their platform. D.
- Accessibility: While they have thorough documentation, exporting data out of Mixpanel requires a deep understanding of their export API and prone to error. We know this to be the case because Treasure Data built a native data connector to let our customers export data from Mixpanel, and it was decidedly not easy (For example, note their word of caution on timestamps). D.
- Pricing: Available even for free users! A.
- SQL Support: Unavailable. Also, unlike Mixpanel, there’s no way to run custom queries on your data. D.
- Latency: Their documentation states that data is exported to the customer’s S3 bucket every 12 hours. This is better than, for example, Mixpanel. C.
- Accessibility: They allow you to export to S3 only, which can be an issue for businesses without AWS-based infrastructure. That said, unlike Mixpanel, they have good tooling for raw data export, including automated export, detailed documentation and a data transformation library for Ruby. B.
- Pricing: KISSMetrics does not have a free tier, but the export API is available for all users. A.
- SQL Support: Yes, via dedicated Redshift instances. A.
- Latency: While they have no information publicly about data latency, speaking to their staff and users, the average latency is between 1-2 hours, which is pretty good. If I had to guess, they probably batch their reads out of their Kafka clusters for raw data access. A.
- Accessibility: Their API is pretty simple, which is both good and bad. Largely due to them new to the market, there aren’t as many tools to help with exporting data. This is a great candidate for Embulk’s input plugin. C.
- Pricing: Amplitude is free up to 10 million events per month, and yes, you can export data for free. A.
- SQL Support: Yes, via their Warehouses product, which lets you access Segment data in your Redshift/PostgreSQL databases or the one managed by them. A.
- Latency: There are two kinds of data latency for Segment: raw data access via Amazon S3 and SQL access via Warehouses. The former is available for Growth Plan and above with 1-2 hours of latency. The latter is available to all users, starting with 24 hours of latency, all the way down to 1 hour. Essentially, the more you pay, the earlier you can get your data. A.
- Accessibility: Their documentation is top notch with many options. Nothing mean to say here. A.
- Pricing: They definitely use the user’s desire to access fresh data as a pricing lever: If you want real-time access to your raw data, you would need to use Webhooks as I’ve previously blogged here. B.
All above tools have their places in data analyst and product manager’s toolchain. That said, some are easier to get data out of than others, and it’s worth paying attention to your constraints: For example, if you are a cash-strapped startup, Google Analytics Premium’s $150,000/year price is a non-starter. On the other hand, if you are a big enterprise with existing BigQuery usage, the $150,000 might have an excellent ROI.
Finally, if you are interested in gathering product, sales and marketing data in one place for broader and deeper insights, check out Treasure Data!