Raw SQL Access to Segment.com Data in Real-time
Raw SQL Access to Segment.com Data in Real-time
What’s Segment?
Segment is a cloud service that simplifies the process of integrating cloud services into your website and mobile apps. You can say it’s a “tag management 2.0” tool to integrate multiple marketing and product management analytics tools with a single SDK. Once the raw data is flowing through Segment, it can be forwarded to any service they support with just a flip of a switch.
Why Integrate Segment with Treasure Data?
…because real-time, direct access to raw events with SQL as a service is awesome.
To be sure, Segment offers its own wonderful Warehouses product that gives SQL access to raw events inside the customer’s data warehouse. The main differences between Segment Warehouses and Treasure Data are summarized in the following table:
Segment Warehouses | Segment/Treasure Data integration | |
---|---|---|
Data update latency | 24 hours in free tier, 1 hour for the highest paying tier | 1 minute |
Schema management | Schema-on-write | Schema-on-read |
Infrastructure managed by | Customer | Treasure Data |
For Redshift enthusiasts out there, I also wanted to add that Treasure Data can output query results into multiple databases, including Redshift.
In the following sections, we will show how to send events from Segment to Treasure Data in two ways: Using the paid integration (for Segment’s Growth tier and above) and the free webhook (for everyone else).
Segment to Treasure Data via the Official Integration
Since mid-2015, Treasure Data has supported a direct integration with Segment. This means that if you are an existing Segment user with an Treasure Data API key (here is how to get one), you can start sending all raw Segment events to Treasure Data with a flick of a switch. You can see the step-by-step guide on our documentation page.
Unfortunately, this approach only works if you are on the Growth plan or above on Segment. This is too bad because we’d love to see more Segment users try our service.
But here’s a good news: Thanks to Segment’s free webhooks integration and our open-source log collector Fluentd, you can start using Treasure Data from all tiers on Segment, including the free one.
Segment to Treasure Data via Webhooks Fluentd
The first step is to set up a small service to receive events from Segment’s Webhooks and route them to Treasure Data. For convenience, I created a small app that can be deployed on Heroku with a Heroku button (Here is the GitHub link). Once your service is up and running, it’s just a matter of pointing Segment’s Webhooks to the service’s URL like this:
To interpret the URLs:
- segment-td.herokuapp.com is the hostname of the Fluentd-powered event routing service.
- The first part of the path, flat or raw, tells the service whether to de-nest JSON events or store them as-is.
- The second parth of the path, segment, is the name of a Treasure Data database that you’ll be storing events.
Treasure Data customers might wonder if/where they can specify table names. Tables are generated based on Segment events. See here for further details.
What’s Next?
If you have not already, sign up for Treasure Data today or drop us a line. Start collecting and analyzing more customer data without spending too much time on wiring data pipelines =)