Graduate from Mixpanel: Tracking Your Funnel Long Term

Graduate from Mixpanel: Tracking Your Funnel Long Term

This post is part two of a two part series. Read part one here.

As we saw with Part 1, we were able to create a funnel that allowed us to easily change the period of time and events that defined the funnel, and then visualize that funnel using R. However, like most business driven analytics, we want to know if our decisions are positively impacting the area they were built for. In Part 2, we’ll expand the query written in Part 1 to show how our funnel’s performance has changed over a longer period of time.

Editing the Query

Luckily, we do not need to collect any additional data beyond what was outlined in part one. Since the time column is already available to use, we just need to add the month as a grouping variable to our original query, as seen below. Be through when updating your query, each subquery will need some changes.


SELECT 
  MONTH,
  COUNT(homepage) AS homepage_total_count,
  COUNT(put_item_in_cart) AS put_item_in_cart_total_count,
  COUNT(checkout) AS checkout_total_count,
  SUM(CASE WHEN (homepage < put_item_in_cart) THEN 1 ELSE 0 END) AS conversion_1_2,
  SUM(CASE WHEN (homepage < put_item_in_cart AND put_item_in_cart < checkout) THEN 1 ELSE 0 END) 
      AS conversion_2_3

FROM (SELECT 
         MONTH,
         user_id,
         MAX(CASE WHEN (event = 'homepage') THEN first_occurance ELSE NULL END) AS homepage,
         MAX(CASE WHEN (event = 'put_item_in_cart') THEN first_occurance ELSE NULL END) AS put_item_in_cart,
         MAX(CASE WHEN (event = 'checkout') THEN first_occurance ELSE NULL END) AS checkout

      FROM (SELECT 
               user_id,
               event,
               td_time_format(td_date_trunc('month', time), 'yyyy-MM') AS MONTH,
               MIN(time) AS first_occurance
            FROM js_console_events
            WHERE td_time_range(time, '2015-01-01', '2016-01-01')
                  AND event IN('homepage', 'put_item_in_part', 'checkout')
                  AND user_id IS NOT NULL

            GROUP BY user_id, 
                     event,
                     td_time_format(td_date_trunc('month', time), 'yyyy-MM')
           ) a

      GROUP BY MONTH,
               user_id
     )

GROUP BY MONTH
ORDER BY MONTH

Note that if grouping by a month is too large of a time period, this can be done with either weeks or days. In our example, grouping by days can be easily done as months, while grouping by weeks will take a more preparation. I’ve used two more of Treasure Data provided UDFs, td_time_format and td_date_trunc, to find the month the of the event, but most SQL implementations have similar functions.

Running this query will produce output similar to below:

month homepage put_item_in_cart checkout conversion_1_2 conversion_2_3
2015-07 548 145 150 83 9
2015-08 583 136 151 84 8
2015-09 638 166 171 103 15
2015-10 640 179 169 102 13
2015-11 687 200 172 124 28
2015-12 714 166 165 97 15

Visualizing in R

From here, we’ll pop over into R. After reading in the data in your preferred method, we’ll create a few new columns. I’ve called my data frame df for simplicity.

In the graphic we created in part one, we created a line graph of the total counts overlaid on a bar chart of the funnel counts. This time, we’ll remove the information from the overall totals and convert the bar chart into a line chart. In my example, there will be one line per month for the last six months. I find anything more than six time periods gets too chaotic to understand.

Also, last time, we dealt with the conversions in counts. While we could continue this in the long-term graph, I think comparing conversion percentage rates is more meaningful. Additionally, you could use percentages in Part 1 as well, by following the code below before melting the dataset. Calculating the percentages is straight-forward:

Get Treasure Data blogs, news, use cases, and platform capabilities.

Thank you for subscribing to our blog!


df$homepage_1 = yf$homepage_total_count/yf$homepage_total_count * 100

df$put_item_in_cart_2 = yf$conversion_1_2/yf$homepage_total_count * 100

df$checkout_3 = yf$conversion_2_3/yf$homepage_total_count * 100

Notice that because of how we’ve defined the constraints of our funnel, that first conversion rate will always be 100%, since a user does not enter the funnel until their first occurrence of the  the first event in the funnel. I’ve included it here as a reminder in case you’ve defined your funnel steps differently.

If you prefer, you can also calculate these within the query and have less data prep in R. It’s up to you.

We need to transpose the data so we’ll use reshape2’s melt() function again. However, this time, we will only want to grab those three conversion percentage columns we just created along with the month.

 

df_m = melt(df, id.vars = "month", 
            measure.vars = c("homepage_1", "put_item_in_cart_2","checkout_3"))

From here, we’ll create the graph using ggplot2 again:


gg = ggplot(df_m) + geom_point(aes(variable, value, group=month, colour=month), size = 2)
gg = gg + geom_line(aes(variable, value, group=month, colour=month), size = .75)
gg = gg + scale_colour_brewer(palette = "RdPu") 
gg

This time, I’ve kept in most of the formatting in case you want to know my settings. I’ve colored each line by the month, creating a gradient from light to dark as the months go on.

Ideally, we would want to see the darker, more current lines moving towards the top, signifying that the percentage conversion rate is increasing for each step.

In our example, there has not been much dramatic change in our example funnel. This makes it slightly difficult to see how the funnel behaves over time. To help with the graph’s clarity, I decided to scale the y-axis to open up the graph a bit more to allow for a closer look. Adding the following line will scale the y-axis using a Log transformation:

gg = gg + coord_trans(x = "identity", y="log2") 

That’s much better! We now are able to decipher each line’s behavior a bit more clearly. Our example funnel has shown some improvement over the last six months, particularly with a spike in November. However for whatever reason, we see a downturn in December. If this was real data, I would then concentrate on investigating why at the end of the year, people were not making it through the funnel.

I would also encourage you to play around with other transformations. I used a log transformation because it showcased the data in the best light, but this transformation may not be optimal for you.

Conclusion

With the graphs created in both parts of this article, you now have a good foundation to begin utilizing funnel analysis as a part of your analytics pipeline. From here, you can customize and expand on the queries and graphs I’ve shown or set up a regular workflow so these metrics are automatically generated. The bonus here is that you have complete control of the data, the funnel, and the graphs. If you enjoyed this series, make sure to subscribe to the blog as I’ll continue to post analytics and data science techniques.

Diana Shealy
Diana Shealy
Diana Shealy is a Data Scientist at Treasure Data. After graduating from college in 2012, she moved to the Bay Area to begin her career. Outside of work, she enjoys traveling, music, and cooking.