Learn SQL by Calculating Customer Lifetime Value Part 1: Setup, Counting and Filtering

Motivation

As far as technical skills go, SQL is a really nice skill to have for product managers and product marketers. Instead of constantly running into performance issues in Excel or begging “technical” people to look stuff up for you, you can get answers to your questions directly from data.

Unfortunately, good, non-encyclopedic resources for learning SQL are few and far between on the Internet. Many of them just tell you the syntax without context. Many assume you are already a programmer and/or omit the technical details of setting up the environment.

The approach here is different. In a series of blog posts, I will walk through a simple but common data analysis scenario. Each step introduces a new concept in SQL that’s tied to a natural business question. Also, each query comes with a screenshot or animation of comparable operations in Excel to aid your understanding. By the end of this series, you should feel comfortable getting read-only access to your company’s databases.

Helpful Mnemonic for Excel Veterans

It is helpful to keep the following mental model throughout this tutorial:

ExcelSQL
TableTable
SortORDER BY
FilterWHERE
Pivot TableGROUP BY
VLOOKUPJOIN
Array FunctionsAggregation Functions

Get the FREE e-book based on this blog series!


Scenario

Imagine that you are a product manager at an e-commerce or SaaS startup. For such a business, the critical metric is CLV: Customer Lifetime Value. If you know how much revenue each customer brings to your business over their lifetime, you have a much better idea of how to meet revenue goals and evolve your product.

Also, if you know the CLV for each customer, then you can make more efficient business decisions. Which customer acquisition channel has high CLVs? Which one is losing money? Are there regional differences?

The goal of this tutorial is to compute the average customer lifetime value from two data sources: users data and payments data.

Setup

For this series, I will be using SQLite. SQLite is simple, free, and shipped with recent versions of Mac OSX, which I assume many of you work on. Windows is also supported (See here, for example, on how to download SQLite on Windows).

Also, this might be the first time you are working with the command line interface. I promise that you won’t have to do much besides entering SQL commands.

First, let’s open up Terminal. If you are on a Mac, it’s as simple as searching for it in the search bar.

The first step is to start sqlite3 with pre-populated tables. Download bootstrap.sql and place it in the directory of your choice. Then, in your terminal, run the following command in the same directory:

This should start up a SQLite interactive shell like this:
Let’s see what tables (again, they are like the tables in Excel) are there. For sqlite3, typing in “.tables” does the job:
Great. There are two tables here, “payments” and “users”. Unlike Excel, you need to actually write a query to see what the data looks like.

To do so, we run the simplest SQL query, which just grabs all the columns and rows in a table. DO NOT DO THIS for large tables. But here, the table is small (with just 10 rows), so go ahead and type in “SELECT * FROM users;” and hit enter. Do not forget the trailing semicolon.

Here is an Excel equivalent:

select * from users

As you can see, this table has three fields/columns:

  • “id” is the user’s ID. This will be referenced in the “payments” table
  • “campaign” is the campaign used to acquire that user.
  • “signed_up_on” is the date when the user signed up for the website.

Sorting with ORDER BY

Just like Excel’s tables, SQL lets you sort data by one or more columns. To do so, we add ORDER BY <column_names> to your SQL statement.

An Excel equivalent is as follows:

learn-sql-order-by

Here is how you can sort by campaign AND signed_up_on.

An Excel equivalent is as follows:

learn-sql-order-by-two-columns

You can also sort in reverse order by adding “DESC”.

An Excel equivalent is as follows:

learn-sql-order-by-desc

And of course, “DESC” can be used when sorted by multiple columns.

An Excel equivalent is as follows:

learn-sql-order-by-two-columns-desc

Filtering with WHERE

One of Excel’s most used features is filters. Of course, SQL, also has filters. They are called WHERE clauses and can express an even broader range of filter conditions than Excel’s filters.

Here is the query that fetches all the users that signed up organically:

An Excel equivalent is as follows:

learn-sql-where

As you can see, we have “WHERE campaign = ‘organic’ in the above query. An analog in Excel is going to the “campaign” column and selecting just ‘organic’.

What if you want to select multiple values? No problem, SQL can handle that with “IN” like this.

An Excel equivalent is as follows:

learn-sql-where-multiple

The above query fetches all the users that signed up through Facebook or Twitter. As you can see, you can list up multiple fields separated by commas inside the parenthesis.

What if you wanted to fetch all the users EXCEPT the ones that came Facebook or Twitter? This is supported with “NOT IN” like this:

(Note that it’s not easy to “exclude” particular values for a given column in Excel).

Okay, but all the filtering thus far involved a single column. Can SQL filter by multiple columns? The answer is yes, and it uses AND to concatenate filter conditions. The following query fetches all the users that came from Facebook or Twitter campaigns that signed up on Oct. 1, 2014.

An Excel equivalent is as follows:

learn-sql-where-multiple-clauses

Now it’s time to show that SQL’s WHERE is more powerful than Excel’s filters. In addition to AND, You can actually use OR to say something like “Get me all the users that signed up before 2014-10-04 OR came in organically”. Here is the query:

(As explained above, this is not easy to do in Excel, either. Most likely, you would need to create an additional column.)

Filtering AND Sorting

As you might have guessed by now, SQL allows you to filter and sort in one pass. The syntax is simple: Have both WHERE and ORDER BY in your query, but make sure WHERE comes before ORDER BY. Here is a query that fetches all the Facebook-/Twitter-sourced users, sorted by campaign.

learn-sql-where-order-by

What’s Next?

Congratulations! You now know how to do the SQL equivalent of Excel’s sorting and filtering. In the next post, I will show how to do “pivot tables” in SQL. In the meantime, if you are interested in use-case specific SQL query templates, check out our library!

Contact me on twitter @kiyototamura or leave a comment if you have any questions about this.

Continue reading Part 2!

Request a demo
Kiyoto Tamura
Kiyoto Tamura
Kiyoto began his career in quantitative finance before making a transition into the startup world. A math nerd turned software engineer turned developer marketer, he enjoys postmodern literature, statistics, and a good cup of coffee.
Related Posts