Log Everything as JSON. Make Your Life Easier

Log Everything as JSON. Make Your Life Easier

The Story of an Engineer.

Here is an anecdote. I am sure some of you have had a similar experience.

  • Alex, an engineer, logs all kinds of events. Since he is the primary consumer of the log, the format is optimized for human-readability. One canonical example is Apache logs:10.0.1.22 – – [15/Oct/2010:11:46:46 -0700] “GET /favicon.ico HTTP/1.1” 404… 10.0.1.22 – – [15/Oct/2010:11:46:58 -0700] “GET / HTTP/1.1” 200…This looks great. Time stamp, URL, HTTP status code…each line gives Alex a lot of information to work with if the service is having issues.
  • Bob, a business analyst, asks Alex for the number of daily unique users. Alex writes a parser for the Apache log and crontabs the script. He also builds a little Web interface so that his colleague can query the parsed data on his own. Bob finds the interface super useful.
  • Bob comes back a few weeks later and complains that the web interface is broken. Alex scratches his head and takes a look at the logs, only to realize that someone added an extra field in each line, breaking his custom parser. He pushes the change and tells Bob that everything is okay again. Instead of writing a new feature, Alex has to go back and has to fill back the missing data.
  • Every 3 weeks or so, repeat Step 3.

What is Wrong With This?

The takeaway lesson of the above story is twofold: (1) logs are not just for humans to read and (2) logs change.

(1) Logs are not just for humans. As Paul Querna points out, the primary consumer of logs are shifting from humans to computers. This means log formats should have a well-defined structure that can be parsed easily and robustly.

(2) Logs change. If the logs never changed, writing a custom parser might not be too terrible. The engineer would write it once and be done. But in reality, logs change. Every time you add a feature, you start logging more data, and as you add more data, the printf-style format inevitably changes. This implies that the custom parser has to be updated constantly, consuming valuable development time.

Enter JSON!

Here is a suggestion: Start logging your data as JSON.

JSON has a couple of advantages over other “structures”.

Get Treasure Data blogs, news, use cases, and platform capabilities.

Thank you for subscribing to our blog!

  • Widely adopted. Most engineers know what JSON is, and there is a JSON library for every language imaginable. This means there is little overhead to parse logs.
  • Readable. Readability counts because engineers have to jump in and read the logs if there is a problem. JSON is text-based (as opposed to binary-based) and its format is a subset of JavaScript object literal (which most engineers are familiar with). In fact, properly formatted JSON is easier to read than logs formatted in ad hoc ways.

An Example of JSON-based Logging: Fluentd

We’ve already talked about Fluentd in this blog, so I won’t bother you with the details. It’s a logging daemon that can talk to a variety of services (ex: MongoDBScribe, etc.)

One of the key features of Fluentd is that everything is logged as JSON. Here is a little code snippet that logs data to Fluentd from Ruby.

require ‘fluent-logger’
# some code in between
log = Fluent::Logger::FluentLogger.new(nil, :host => ‘localhost’, :port=>24224)
log.post(‘myapp.access’, {“user-agent” => user_agent})

Now, suppose you wanted to start logging the referrer URL in addition to user agent. You just need to update the Ruby hash that corresponds to JSON.

require ‘fluent-logger’
# some code in between
log = Fluent::Logger::FluentLogger.new(nil, :host => ‘localhost’, :port=>24224)
log.post(‘myapp.access’, {“user-agent” => user_agent, ”referrer” => referrer_url}) # Added a field!

That’s the only change you need to make. All the existing scripts work as before, since all we did was adding a new field to the existing JSON.

In contrast, imagine you were logging the same data in a printf-inspired format. Your code initially looks like this:

log = CustomLogger.new
#some code in between
log.post(“web.access”, “user-agent: #{user_agent} blah blah”)

When you decide to log the referrer URL, you update it to:

log = CustomLogger.new
#some code in between
log.post(“web.access”, “user-agent: #{user_agent} blah blah referrer: #{referrer_url}”)

Now, most likely your old parser is broken, and you have to go and update your regex and whatnot.

We are biased towards Fluentd because we wrote it ourselves. But regardless of what software/framework you choose for logging, you should start logging everything as JSON right away.

Treasure Data
Treasure Data
Treasure Data is the only enterprise Customer Data Platform (CDP) that harmonizes an organization’s data, insights, and engagement technology stacks to drive relevant, real-time customer experiences throughout the entire customer journey. Treasure Data helps brands give millions of customers and prospects the feeling that each is the one and only. With its ability to create true, unified views of each individual, Treasure Data CDP is central for enterprises who want to know who is ready to buy, plus when and how to drive them to convert. Flexible, tech-agnostic, and infinitely scalable, Treasure Data provides fast time to value even in the most complex environments.