Raw SQL Access to Segment.com Data in Real-time

Raw SQL Access to Segment.com Data in Real-time

What’s Segment?

Segment is a cloud service that simplifies the process of integrating cloud services into your website and mobile apps. You can say it’s a “tag management 2.0” tool to integrate multiple marketing and product management analytics tools with a single SDK. Once the raw data is flowing through Segment, it can be forwarded to any service they support with just a flip of a switch.

Why Integrate Segment with Treasure Data?

…because real-time, direct access to raw events with SQL as a service is awesome.

To be sure, Segment offers its own wonderful Warehouses product that gives SQL access to raw events inside the customer’s data warehouse. The main differences between Segment Warehouses and Treasure Data are summarized in the following table:

Segment Warehouses Segment/Treasure Data integration
Data update latency 24 hours in free tier, 1 hour for the highest paying tier 1 minute
Schema management Schema-on-write Schema-on-read
Infrastructure managed by Customer Treasure Data

For Redshift enthusiasts out there, I also wanted to add that Treasure Data can output query results into multiple databases, including Redshift.

In the following sections, we will show how to send events from Segment to Treasure Data in two ways: Using the paid integration (for Segment’s Growth tier and above) and the free webhook (for everyone else).

Segment to Treasure Data via the Official Integration

Since mid-2015, Treasure Data has supported a direct integration with Segment. This means that if you are an existing Segment user with an Treasure Data API key (here is how to get one), you can start sending all raw Segment events to Treasure Data with a flick of a switch. You can see the step-by-step guide on our documentation page.

Get Treasure Data blogs, news, use cases, and platform capabilities.

Thank you for subscribing to our blog!

Unfortunately, this approach only works if you are on the Growth plan or above on Segment. This is too bad because we’d love to see more Segment users try our service.

But here’s a good news: Thanks to Segment’s free webhooks integration and our open-source log collector Fluentd, you can start using Treasure Data from all tiers on Segment, including the free one.

Segment to Treasure Data via Webhooks Fluentd

The first step is to set up a small service to receive events from Segment’s Webhooks and route them to Treasure Data. For convenience, I created a small app that can be deployed on Heroku with a Heroku button (Here is the GitHub link). Once your service is up and running, it’s just a matter of pointing Segment’s Webhooks to the service’s URL like this:

To interpret the URLs:

  1. segment-td.herokuapp.com is the hostname of the Fluentd-powered event routing service.
  2. The first part of the path, flat or raw, tells the service whether to de-nest JSON events or store them as-is.
  3. The second parth of the path, segment, is the name of a Treasure Data database that you’ll be storing events.

Treasure Data customers might wonder if/where they can specify table names. Tables are generated based on Segment events. See here for further details.

What’s Next?

If you have not already, sign up for Treasure Data today or drop us a line. Start collecting and analyzing more customer data without spending too much time on wiring data pipelines =)

Kiyoto Tamura
Kiyoto Tamura
Kiyoto began his career in quantitative finance before making a transition into the startup world. A math nerd turned software engineer turned developer marketer, he enjoys postmodern literature, statistics, and a good cup of coffee.