The 4 Important Things About Analyzing Data Part 2: Understand the Purpose of the Analysis and Who Needs the Results
Before analyzing data, it is important to first clearly understand for whom and for what purpose you are conducting the analysis.
This is essential because analytics assist humans in making decisions. Therefore, conducting the analysis to produce the best results for the decisions to be made is an important part of the process, as is appropriately presenting the results.
There are cases, for instance at leading data companies such as Amazon and Google, where data is used in such a way that the analysis results themselves make the decisions, for example regarding recommendation engines, PageRank, and demand forecasting systems. To achieve this, advanced techniques of machine learning and statistical models are applied, thus resulting in mechanically improved (though not necessarily perfect) results from the data. Because these techniques reference large-scale data sets and reflect analysis and results in real time, they are applied to areas that extend beyond human decision-making.
On the other hand, a fundamental aspect of analytics is decision support – in other words, providing material to support the human decision-making process.
In my role as a data scientist, I am very focused on data analysis to support people as they make decisions. To effectively do so, it is very important to not overcomplicate the analysis (perhaps making them more difficult than necessary) and to clearly and succinctly present results. This might seem to undermine the advances of automated techniques, but when you consider the number of decisions people need to make each day and the fact that most decision-makers are not data specialists, it becomes important to present results in a simple and concise way.
Further, the ambiguity that comes with presenting the results of the analysis in a loose form sometimes yields value over pure statistical accuracy in order to enable informed decisions. For example, a clustering technique will automatically do segment classification based on user behavior. However, the result of the determination would look something like “a user cluster with a billing amount: $5.76 – $28.84, and a use time of 3.4 days – 8.9 days.” Even after obtaining results with such subtle parameters, is this knowledge optimal for aiding the human decision-making process?
Conversely, even if there is some ambiguity or bias, I believe it is more meaningful for the other party if you present results within easy-to-understand parameters such as “user segment with a billing amount of $5 to $40 and a use time 1 day to 7 days.” Additionally, to properly visualize this, I recommend recreating the aggregation according to the perspective and parameters requested.
Discard Any Fixation to One Analysis Technique
Many people have done amazing things using incredibly sophisticated and complex analysis techniques for research at universities, etc. Many of these research skills have been invested to refine machine learning so that it will automatically make the right decision. I think that is amazing.
I understand why people with these backgrounds prefer to use such sophisticated techniques. However, it is also true that this complexity is not effective when the purpose of the analysis is to aid human decision-makers.
Therefore, it is very important to discard any fixation and inclination toward certain research, review your approach, and provide results that support human decision-making by considering how to best present and explain the results in an easy-to-understand way.
Click here to read part 1 in this series: The Importance of Providing Many ‘Obvious’ Results.