Using Treasure Data at ContextLogic
Today’s entry is by Danny Zhang, a cofounder of ContextLogic. Backed by notable investors, ContextLogic brings together top technical talent from Google, Facebook, and Yahoo!. Their flagship product, Wish improves online merchandise discovery by letting users discover and share cool products and get rewards. We launched Wish last November, and I am pleased to see that we have over 1.8 million registered users already, making it one of the top 200 Facebook applications worldwide.
We log everything.
At ContextLogic, we make every product decision based on rigorous data analysis. We deploy a robust A/B testing framework continuously to make informed decisions. From the color of a "Buy" button to title copy, we track and compare the performance of old and new versions to validate or reject our hypothesis.
Because we firmly believe in data-driven product development, logging is the second highest priority only after serving. Every time our engineers add a new feature to the codebase, I review the code to ensure that the feature comes with thorough logging. When new engineers come on board, I explain to them in detail how critical logging is to us and why.
Treasure Data: how we log everything robustly.
It is one thing to say, "We should log everything" and another to actually do it. In theory, logging is not difficult. In practice, it is surprisingly costly to log data reliably and efficiently. In the past, we tried Scribe, the logging software open-sourced by Facebook. Scribe was performant and robust, but it was resource-intensive to configure, build, and maintain on the scale we are at, and it had compatibility issues with our version of Python.
It really comes to the question of “How much time do we want to spend on logging?” and the answer is “As little time as possible while logging everything correctly and keeping data accessible.”
Enter Treasure Data.
Treasure Data is a cloud storage/computation platform that provides a HiveQL (Apache Hive) interface. As someone who once cut teeth setting up Hadoop clusters and running large-scale MapReduce jobs, I immediately saw its value. We signed up for the service and started logging all of our data. Using their Python driver, it was as easy as adding a few lines to our code.
We have been using Treasure Data for 6 months now, and it's been great so far. The service is reliable. We log more than 20,000 events (7.2 million line items) per day and have had 4 bad requests so far. They also provide first-class support, boosting my confidence in the service.
As a lead engineer, I am pleased to see how much time Treasure Data is saving us. Our engineers no longer worry about maintaining a logging framework or troubleshooting Scribe. Instead, they focus on what to log and how to use the logged data to make our products better. Trust me, when you only have 10 engineers or so, the last thing you want is 5 of them scratching their heads why logging is not working.
Advice to startups
As startups, we must wisely allocate our limited resources and focus on making a great product. To improve the product, however, you need to log relevant data. This creates a dilemma: How can we log thoroughly and accurately while spending as little time as possible on logging? Treasure Data was our solution, and I strongly recommend it to you as well.