Learn SQL by Calculating Customer Lifetime Value Part 1: Setup, Counting and Filtering

Learn SQL by Calculating Customer Lifetime Value Part 1: Setup, Counting and Filtering

Motivation

As far as technical skills go, SQL is a really nice skill to have for product managers and product marketers. Instead of constantly running into performance issues in Excel or begging “technical” people to look stuff up for you, you can get answers to your questions directly from data.

Unfortunately, good, non-encyclopedic resources for learning SQL are few and far between on the Internet. Many of them just tell you the syntax without context. Many assume you are already a programmer and/or omit the technical details of setting up the environment.

The approach here is different. In a series of blog posts, I will walk through a simple but common data analysis scenario. Each step introduces a new concept in SQL that’s tied to a natural business question. Also, each query comes with a screenshot or animation of comparable operations in Excel to aid your understanding. By the end of this series, you should feel comfortable getting read-only access to your company’s databases.

Helpful Mnemonic for Excel Veterans

It is helpful to keep the following mental model throughout this tutorial:

Excel SQL
Table Table
Sort ORDER BY
Filter WHERE
Pivot Table GROUP BY
VLOOKUP JOIN
Array Functions Aggregation Functions

Get the FREE e-book based on this blog series!


Scenario

Imagine that you are a product manager at an e-commerce or SaaS startup. For such a business, the critical metric is CLV: Customer Lifetime Value. If you know how much revenue each customer brings to your business over their lifetime, you have a much better idea of how to meet revenue goals and evolve your product.

Get Treasure Data blogs, news, use cases, and platform capabilities.

Thank you for subscribing to our blog!

Also, if you know the CLV for each customer, then you can make more efficient business decisions. Which customer acquisition channel has high CLVs? Which one is losing money? Are there regional differences?

The goal of this tutorial is to compute the average customer lifetime value from two data sources: users data and payments data.

Setup

For this series, I will be using SQLite. SQLite is simple, free, and shipped with recent versions of Mac OSX, which I assume many of you work on. Windows is also supported (See here, for example, on how to download SQLite on Windows).

Also, this might be the first time you are working with the command line interface. I promise that you won’t have to do much besides entering SQL commands.

First, let’s open up Terminal. If you are on a Mac, it’s as simple as searching for it in the search bar.

The first step is to start sqlite3 with pre-populated tables. Download bootstrap.sql and place it in the directory of your choice. Then, in your terminal, run the following command in the same directory:

sqlite3 -init bootstrap.sql

This should start up a SQLite interactive shell like this:

sqlite>

Let’s see what tables (again, they are like the tables in Excel) are there. For sqlite3, typing in “.tables” does the job:

sqlite>.tables
payments users

Great. There are two tables here, “payments” and “users”. Unlike Excel, you need to actually write a query to see what the data looks like.

To do so, we run the simplest SQL query, which just grabs all the columns and rows in a table. DO NOT DO THIS for large tables. But here, the table is small (with just 10 rows), so go ahead and type in “SELECT * FROM users;” and hit enter. Do not forget the trailing semicolon.

sqlite> SELECT * FROM users;
id          campaign    signed_up_on
----------  ----------  ------------
1           facebook    2014-10-01
2           twitter     2014-10-02
3           direct      2014-10-02
4           facebook    2014-10-03
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
8           direct      2014-10-05
9           twitter     2014-10-05
10          organic     2014-10-05

Here is an Excel equivalent:

select * from users

As you can see, this table has three fields/columns:

  • “id” is the user’s ID. This will be referenced in the “payments” table
  • “campaign” is the campaign used to acquire that user.
  • “signed_up_on” is the date when the user signed up for the website.

Sorting with ORDER BY

Just like Excel’s tables, SQL lets you sort data by one or more columns. To do so, we add ORDER BY <column_names> to your SQL statement.

sqlite> SELECT * FROM users ORDER BY campaign;
id          campaign    signed_up_on
----------  ----------  ------------
3           direct      2014-10-02
8           direct      2014-10-05
1           facebook    2014-10-01
4           facebook    2014-10-03
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
10          organic     2014-10-05
2           twitter     2014-10-02
9           twitter     2014-10-05

An Excel equivalent is as follows:

learn-sql-order-by

Here is how you can sort by campaign AND signed_up_on.

sqlite> SELECT * FROM users ORDER BY campaign, signed_up_on;
id          campaign    signed_up_on
----------  ----------  ------------
3           direct      2014-10-02
8           direct      2014-10-05
1           facebook    2014-10-01
4           facebook    2014-10-03
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
10          organic     2014-10-05
2           twitter     2014-10-02
9           twitter     2014-10-05

An Excel equivalent is as follows:

learn-sql-order-by-two-columns

You can also sort in reverse order by adding “DESC”.

sqlite> SELECT * FROM users ORDER BY campaign DESC;
id          campaign    signed_up_on
----------  ----------  ------------
2           twitter     2014-10-02
9           twitter     2014-10-05
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
10          organic     2014-10-05
1           facebook    2014-10-01
4           facebook    2014-10-03
3           direct      2014-10-02
8           direct      2014-10-05

An Excel equivalent is as follows:

learn-sql-order-by-desc

And of course, “DESC” can be used when sorted by multiple columns.

sqlite> SELECT * FROM users ORDER BY campaign, signed_up_on DESC;
id          campaign    signed_up_on
----------  ----------  ------------
8           direct      2014-10-05
3           direct      2014-10-02
4           facebook    2014-10-03
1           facebook    2014-10-01
10          organic     2014-10-05
7           organic     2014-10-04
5           organic     2014-10-03
6           organic     2014-10-03
9           twitter     2014-10-05
2           twitter     2014-10-02

An Excel equivalent is as follows:

learn-sql-order-by-two-columns-desc

Filtering with WHERE

One of Excel’s most used features is filters. Of course, SQL, also has filters. They are called WHERE clauses and can express an even broader range of filter conditions than Excel’s filters.

Here is the query that fetches all the users that signed up organically:

sqlite> SELECT * FROM users WHERE campaign = 'organic';
id          campaign    signed_up_on
----------  ----------  ------------
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
10          organic     2014-10-05

An Excel equivalent is as follows:

learn-sql-where

As you can see, we have “WHERE campaign = ‘organic’ in the above query. An analog in Excel is going to the “campaign” column and selecting just ‘organic’.

What if you want to select multiple values? No problem, SQL can handle that with “IN” like this.

sqlite> SELECT * FROM users WHERE campaign IN ('facebook', 'twitter');
id          campaign    signed_up_on
----------  ----------  ------------
1           facebook    2014-10-01
2           twitter     2014-10-02
4           facebook    2014-10-03
9           twitter     2014-10-05

An Excel equivalent is as follows:

learn-sql-where-multiple

The above query fetches all the users that signed up through Facebook or Twitter. As you can see, you can list up multiple fields separated by commas inside the parenthesis.

What if you wanted to fetch all the users EXCEPT the ones that came Facebook or Twitter? This is supported with “NOT IN” like this:

sqlite> SELECT * FROM users WHERE campaign NOT IN ('facebook', 'twitter');
id          campaign    signed_up_on
----------  ----------  ------------
3           direct      2014-10-02
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
8           direct      2014-10-05
10          organic     2014-10-05

(Note that it’s not easy to “exclude” particular values for a given column in Excel).

Okay, but all the filtering thus far involved a single column. Can SQL filter by multiple columns? The answer is yes, and it uses AND to concatenate filter conditions. The following query fetches all the users that came from Facebook or Twitter campaigns that signed up on Oct. 1, 2014.

sqlite> SELECT * FROM users WHERE campaign in ('facebook', 'twitter') AND signed_up_on = '2014-10-01';
id          campaign    signed_up_on
----------  ----------  ------------
1           facebook    2014-10-01

An Excel equivalent is as follows:

learn-sql-where-multiple-clauses

Now it’s time to show that SQL’s WHERE is more powerful than Excel’s filters. In addition to AND, You can actually use OR to say something like “Get me all the users that signed up before 2014-10-04 OR came in organically”. Here is the query:

sqlite> SELECT * FROM users WHERE campaign = 'organic' OR signed_up_on < '2014-10-04';
id          campaign    signed_up_on
----------  ----------  ------------
1           facebook    2014-10-01
2           twitter     2014-10-02
3           direct      2014-10-02
4           facebook    2014-10-03
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
10          organic     2014-10-05

(As explained above, this is not easy to do in Excel, either. Most likely, you would need to create an additional column.)

Filtering AND Sorting

As you might have guessed by now, SQL allows you to filter and sort in one pass. The syntax is simple: Have both WHERE and ORDER BY in your query, but make sure WHERE comes before ORDER BY. Here is a query that fetches all the Facebook-/Twitter-sourced users, sorted by campaign.

sqlite> SELECT * FROM users WHERE campaign in ('facebook', 'twitter') ORDER BY campaign;
id          campaign    signed_up_on
----------  ----------  ------------
1           facebook    2014-10-01
4           facebook    2014-10-03
2           twitter     2014-10-02
9           twitter     2014-10-05

learn-sql-where-order-by

What’s Next?

Congratulations! You now know how to do the SQL equivalent of Excel’s sorting and filtering. In the next post, I will show how to do “pivot tables” in SQL. In the meantime, if you are interested in use-case specific SQL query templates, check out our library!

Contact me on twitter @kiyototamura or leave a comment if you have any questions about this.

Continue reading Part 2!

Kiyoto Tamura
Kiyoto Tamura
Kiyoto began his career in quantitative finance before making a transition into the startup world. A math nerd turned software engineer turned developer marketer, he enjoys postmodern literature, statistics, and a good cup of coffee.