Introduction
The above chart was featured on the Economist early this month. The above talks about the impact of Uber on the number of alcohol related crashes in New York City. The chart claims that it shows these numbers in contrast to other counties.
Some of the key takeaways from the above charts are that alcohol related crashes have reduced since the time Uber was introduced(indicated by the red line in the time line). The graph does a fine job of showing the drop in crash rates in all the counties except Staten Island.
However, the representation does not do full justice to the point that the author wants to convey. Some key questions that I would consider before creating a visual representation like this would be –
- What is the key point I am trying to convey?
The author wants to convey that something led to something. So, one of the key ways to prove this point in this case would be to show the negative correlation between the two parameters. There is no mention of an increase in Uber adoption leading to a drop in accidents from the time Uber was introduced. The other problem with the visualization is that it talks about the number of accidents and not specifically about the accidents related to drunken driving.
2. Is it possible that if I sliced this data across a different time duration, I might be able to prove otherwise?
While the drop in accidents is certain and definitive, there is also a visible hockey-stick like trend visible after 2012 in Brooklyn and Queens.So if I was to prove that the authors claim is wrong , all I will have to do is zoom in on 2012-2013 and show the increasing trend.
3. Why break by counties when you are talking about NYC as a whole?
The fact that the author has diced the geography by county creates a question about the consistency of this trend at the overall level. When rolled up at the overall level, it might be possible that this trend is not quite accurate.
4. Why 3 month moving average?
The metric of choice for representation in the graph above is the 3 month moving average. As we know, moving averages smoothen out any spikes in the trend. However despite the fact that it smoothens out values, there are spikes that are visible indicating high variance. So rather than visualizing the moving average, the author might have been able to make a strong case by simply visualizing the absolute number of accidents every year.
What could the author have done better?
To begin with, the author could have defined the metric more specifically around instances of alcohol induced accidents rather than just simply accidents. In addition to that, showing the negative correlation in Uber adoption versus the number of alcohol related accidents for starters(the scatter plot creates a stronger impression when we talk about correlated events despite the fact that correlation does not imply causation) would have gone a great deal further in explaining the point the author is trying to make . He could have also swapped the metric of choice-3 month moving average of number of crashes with the absolute number of crashes caused by drunken driving rolled up at the year level. Had he added these elements, I am sure he would have gone a great deal further in convincing people about the claim he/she is trying to make.