Power of Simplicity!

https://www.clickz.com/wp-content/uploads/sites/2/2016/06/munster.jpg

 

Often than less we get euphoric while viewing dashboards of today with fancy editing and what not thus overseeing the true meaning that some dashboards tend to portray due to their simplicity. All through our childhood we are taught history as a subject to learn from the past and here we have an old visualization, simple yet powerful, depicting how a few images can change our thinking for better.

Munster, a town in Germany, produced this visualization back in 1991 to encourage bus use. It beautifully shows impact of same number of people (72) on bicycles, cars and a bus and the relative space that each occupies on a road.

Traffic related issues are growing day by day as the number of cars are increasing at a staggering pace. The day is not far when we’d run out of roads only to be succumbed by the daily traffic jams. This issue is not just for the future but a lot of cities like Delhi are currently facing an uphill task to overcome this menace.

What I really love about this visualization is how quickly in a single glance you get the message loud and clear. It is certainly ahead of its time when there weren’t much of editors or applications that helped you build such epic dashboards.

One of the changes I could suggest for its time is for the middle one to show the complete picture of the impact of 72 cars on a road. Also, the message could have been portrayed with a greater depth by creating a series of pictures with more number of people in each series than the previous one to scale the problem at an increasing pace because the space taken by two buses is nothing compared to the space taken up by 150 cars.

Despite such small shortcomings, this visualization enlightens us in many ways than one can think of. To conclude, a dashboard need not be fancy to portray something simple yet meaningful.

Highschool graduation rate in USA

Twitter is a popular platform used by governments and leaders to communicate with public. Since the character limit for a tweet is limited, it makes sense to convey the intended information through charts rather than boring wordy reports with statistics. Here is one such tweet from Dec 2015 from WhiteHouse

Audience and intent – This chart is intended for public who is interested and constantly evaluating government’s performance. The Whitehouse wants to convey that the high school graduation rate is the highest in 2015 than it has ever been and implicitly highlight this as an achievement of the government.

Is the chart meeting the purpose?  To a certain extent and to people with not so keen eye for detail, yes – this chart serves the purpose. However, the chart does not represent high school graduation rate data in its entirety and is subject to speculations.

Critique

  • The type of chart used is vague. A column-like chart is represented using books with a 3D effect. 5 books represent 75% and 16 books represent 82% which is quite absurd.
  • The graduation rate is represented as a percentage. A percentage of what? I am assuming that it is relative to the number of students enrolled in 12th grade.
  • Thinking further, I would want to know if there is any change in number enrolled for 12th grade. I am assuming the proportion of high school aged section in a given population does not change drastically over the years,  so ideally as the population grows, the number of people/kids in high-school-age-group also increases. If there is no increase in the number enrolled with passage of years the chart seems to be misleading.
  • Also, this chart does not give out any information regarding the drop outs. For example, a school has 120 students for 11th grade out of which 20 dropped out. 99 students out of the 100 who were promoted to 12th grade passed the exams may imply the high school graduation percentage is 99% (99/100)  or 82.5% (99/120).

Betterment – In chart-making, choosing the appropriate form to represent the data on hand is of utmost importance. Ideally a line chart is suitable to show subtle changes in rates over time. However for the high school graduation rates we have different parameters involved. I would like to see in a given year the number of people between the years 17 to 21 years and the percentage of them with high school diploma. To represent these details, I would use a bar graph. Y axis represents the population number scale (number of people between 17 – 21 years) and X axis represents year. Each bar is stacked, i.e divided into 2 stacks with different colors, each color stack representing the number of people with high school diploma and without high school diploma respectively.

 

References:

Washington Post – Highschool graduation rate hits an all time high

Whitehouse archives – More students are graduating than ever

Where Do EPL Players Come From?

In sports a team’s goal is to be successful. As with many sports in soccer success is winning as many games as possible to make it to post season and eventually win the championship.

How does an organization build a championship winning team? There are a lot of factors that can make an impact and data can help influence recruiting decisions.

Putting ourselves in the shoes of a recruiter, we’re looking to put together a new star team in the EPL(English Premier League). One of the best ways to learn is from looking at history. We are lucky enough to have data on the players currently in the league.

The objective is to find what would make an ideal recruit for our team (data only).  We want to find the optimal player profile that will help us have a successful season.

Location is important and this dashboard can tell us a lot about where to look for players:

https://www.tableau.com/solutions/workbook/create-optimal-game-strategies-based-past-results

according to the dashboard:

  • Most players have spent time in other countries, but most have spent >50k days playing in Europe, conclusion is that there is experience in European leagues or the EPL
  • EPL recruitment is concentrated in EU, but also pulls from other countries WW
  • When looking at club breakdown, most patterns look similar with a cluster in EU and variance in the outliers (players from the US, Spain, South America). It is hard to correlate to current standings.
  • Birth country shows that the EU isn’t the only location dominating the top – players are born in Senegal, Brazil, Argentina, and Nigeria (even though no players were directly recruited from Africa)
  • The author digs further into Africa, revealing that a significant number of players born in Africa play in the EPL, regardless of their recruiting or development country, highlighting that people from Africa often come to Europe from France are developed and are recruited from Europe

Based on common trends it is pretty conclusive that focus would be on recruiting players that have done development work EU, but with South American, African, or European backgrounds.

This dashboard covers location, but it is not enough to tell us what makes the perfect player.

It does not include individual player performance or what combination of these two builds success (if someone figured that out, recruiters wouldn’t be needed).  The NCAA makes some recommendations on what body fat and other characteristics a higher level soccer player should have (among other sports):

http://www.ncaa.org/health-and-safety/sport-science-institute/body-composition-what-are-athletes-made

Pros

  • Many different types of visuals are used
  • The “story” aspect of Tableau is used to direct attention to the point the author is trying to make
  • A lot of good data on what teams are already doing
  • Compares different recruiting patterns, but doesn’t show how that impacts
  • The visuals are all straightforward. It is not confusing to understand any page of the dashboard.

Cons

  • Doesn’t provide the ability to necessarily explore options outside of what people are already doing, an observation of what current teams are already doing (IE what happens if I expand to recruiting in Antarctica)
  • Can’t focus in on a specific team, don’t have enough info to use the data to see how expanding to new markets for recruiting has impacted performance (IE Everton recruiting in South America vs AFC Bournemouth staying closer to EU)
  • Don’t put rankings, team performance

How would I change it?

  • Presuming the goal is to provide insight that one can take action on, more data points need to be added in addition to location. Location is only an observation, adding player characteristics, team characteristics (resources), and performance can add more background and context to explain patterns and correlation about the players/teams and performance over time that could be made into recommendations.

How else could I use this data?

  • If I am looking to someday become a player in the EPL, based on geography I could make strategic decisions on where to play (or to do development) in order to increase my chances on getting into the league (purely based on geographic indicators from this data)
  • If significant correlations are seen between players from (born) in certain countries and performance in the EPL, development investment could be made in those countries (IE if players born in Africa are high performing, how do we optimize)

 

This beautiful chart could have been perfect, if only….

The chart below gives an overview of the top ten global brands in the years 2006 to 2015.

The chart’s data sources are from interbrand.com who state that they use financial data, brand role and brand strength to come up with the ranking. Let’s take a closer look.

Audience – This data visualization is aimed at a general audience of internet users, particularly those who are interested in business trends – business people & the curious.

Action – The chart does not provide any actionable insights / they are not apparent at first glance (or even after a few glances).

Key Takeaway – The different brands that have been ranked the top ten globally in the years 2006 – 2015.

What I like about the chart –

Color – The overall choice of color is good. The chart is visually pleasing and the color palette doesn’t throw off users by distracting or misguiding their focus. The brand names are also clearly visible within the circles, care has been taken to make sure the font colors are in contrast to the surrounding circles to enhance readability.

Readability – The chart as a whole clearly communicates what it aims to present to the audience. There are no unnecessary elements like 3D shapes to deter readability. The brand legend on the right and the year scale at the bottom are clearly labelled. The bottom scale is also uniform with one year increments. The lines between the circles also add to the readability by helping the audience map the brand’s journey in the top ten ranks through the different years.

What I don’t like about the chart –

The metrics that are used to rank the brands aren’t stated on the chart.

Although this chart is thoughtfully designed, it lacks a number of communicative elements that could have made it much better. Although it is not apparent at first sight, some of the brands have completely dropped out of the top ten while others have emerged only in the middle. Some brands have reemerged after a hiatus only to gain a top position after having dropped out at a much lower rank.

After admiring the chart for its aesthetics, one is left with more questions than to begin with when one shifts focus to the trends/data.

Redesign –

A part of the chart’s audience is business users / inquisitive general users. And as such the chart would provide more value to the user by highlighting important turning points in a brand’s journey through hovering tooltips or content filtering. Selective / optional communicative depth means the chart would better cater to its users – business, inquisitive and casual. Inquisitive users would be able to focus on a brand of interest, delve deeper into the brands journey and possibly gain useful business insights.

I made a rough redesign concept based on the above pointers. You can view it here.

In the redesign, Toyota’s turbulent journey has been highlighted. Users can drill deeper into the whats and whys of how Toyota remerged in 2012 in the top ten and went on to get a hold of the 6th place in 2015.

Interactivity on this chart allows for a more rich and holistic contextual experience while hiding away that level of detail from less interested users.

Source : https://www.reddit.com/r/dataisbeautiful/comments/686l51/top_10_global_brands_20062015_oc/

References : http://www.scribblelive.com/blog/2012/08/06/interaction-design-for-data-visualizations/

http://interbrand.com/best-brands/best-global-brands/methodology/

 

Is a fancy Viz required to convey simple message?

Visualization has been a key to depicting lot’s of information by occupying limited space. But, is it been used incorrectly and unnecessarily? Did you ever come across fancy visualizations which convey simple information?

Recently, I came across an article about Drinking ages in Canada and found a viz which portray states with their legal drinking ages. The bar(Image 1.0) has provinces on X-axis and age on Y-axis, we can clearly see that apart from B.C, Alberta, and Quebec rest of the provinces drinking age is 19. This information was explained in a one liner statement. So, the question is do we need a Viz or how can we show it in a better way?

There are multiple flaws in this viz, let’s discuss them in detail. The Y-axis tick label which refers age have a scale of 0.6, usually all the legal permits are made to a certain age which is an integer rather than a running age. Secondly, age will not be in the base of 10s and a year forms from 12 months. In order to rectify the scale, we need to modify the chart to show age in integers like 17,18,19 etc.

Image 1.0 – Drinking age

Coming to grid lines used in the chart(Image 1.0), they are not necessary. The data we are showing is not varying in decimal values and doesn’t fluctuate and we need to keep in mind that age usually doesn’t vary a lot.

Overall, there are only three provinces which have legal drinking age 18 and rest are 19. Therefore, a bar chart to depict this is a wrong choice or not alt all needed.

What are the ways by which we can improve this chart? If you have any ideas please fell free to add it to comments.

I have a couple of ideas to address this issue, Image 2.0 is an example to show regions, which can be replicated for our requirement. We can denote the legal age of 19 as Orange regions and rest in blue for the legal age of 18. In this chart, the regions are quite clear and the names on that can easily be highlighting.

Image 2.0 – Region Plotting

The second option would be to use a simple table which will be easy to read and understand.

Finally, did you reach to a conclusion to use a viz or not in this scenario? Let me help you, first try to analyze data and do some profiling. This will help you to decide to take a call for viz or no-viz. Secondly, a better understanding of data you have so that you can plan better. If you plan for a chat then the choice of the chart and the way it has scaled should be taken care because it will have a great impact on readers. Else, a table is always a good choice.

Reference – http://www.parklandonline.com/drinking-age-will-remain-19-in-saskatchewan/

How quitting smoking changes your body

Introduction

We all are aware of the negative implications of smoking on our system. Smoking is one of the habits known to reduce the life expectancy. It causes cancer and numerous other health complications. A common belief is that longevity of chain smoker is less than that of a non-smoker. Cigarette smoking attributes to 443000 deaths each year in United States. One of the claim is that, the younger you are when you quit, the greater the health benefits. And quitting at any age adds years to life.

What I appreciate about the visualization?

The visualization does an amazing job at convincing what happens to our body at every phase after last puff. The whole idea of this visualization is to persuade smokers to think of benefits when they give up on smoking. The color scheme equipped with human anatomy and brief description gives handful of information at first sight. It does a decent job in explaining how the system improves by hours to days to months and years.

What I don’t appreciate?

It is a very generic visualization and doesn’t target any specific group of smokers to prove the claim. There are no numbers which can explain what population of the smokers saw this change after their last puff. While it provides details on overall risk factors, it fails to consider fertility aspect which is one of the growing concerns in both male and female population.

How it could be improved?

It’s essential to visualize the changes in the human system targeted to different age groups, gender, nicotine dependency and medical background. A smoker who is 25 years with no other health complications might respond differently than a smoker with asthma who might take longer time to return to normalcy.

To support the claim, visualization should include numbers on how far people went on to live after they quit smoking as per age group. And what are the health benefits they witnessed over a period of time.

References:

http://www.huffingtonpost.com/2014/12/05/effects-of-quitting-smoking_n_5927448.html

 

The Art Of Depicting Data

 

Intriguing! Isn’t it? The above chart is a representation of the results of a year long quantified self project of diabetes control. It plots the blood sugar levels of Doug, who conducted the experiment, the day for the year 2012 and also shows us the miles ran on that day. At the first glance I couldn’t say that something this pretty could actually mean something this serious.

Doug has been a diabetic patient for 32years now, 2012 as he claims was the healthiest year of his life and he proves it through the results of his experiment. He tracked every blood sugar readings, every insulin dose, every meal and all my activity data. He certainly in his visualisation has covered the dimensions of data visualisation. To list a few : The chart is visually appealing to the audience and hence has the beauty dimension covered, it contains his personal experiences and hence insightful and his results definitely encourages other patients to work towards blood sugar control and hence enlightening.

Getting into the details, the chart answers your describe questions : What? Blood sugar level and miles ran, When? Year 2012, Who? Doug. Also answering you the explanatory questions: Why ? To control diabetes, How? By self tracking and exercise. It hasn’t stepped back from predicting that this procedure helps you lead a healthier life and prescribing self tracking and exercise to diabetes patients. Not only has he plotted his test results and miles ran, but has noted the important life events helping us get a better understanding to his story he is trying to tell us through his chart.

What amazes me the most is that this chart contains the data from 91,251 blood sugar readings.The month initials in the inner circle is really helpful to track the time period. Having said all the positive reviews that had me awestruck at this piece of art,I however can spot a drawback. While the choice of colour is what makes this chart art, the choice of having white is what creates a problem.It makes it difficult for the audience to connect the warm and cold colours and hence not easy to know the minimum and maximum  blood sugar per day. A smoother transition would be more effective.If i were to redo this it would be the only thing I would change about the visualisation.

A picture is worth a thousand words, indeed! However today we see data visualisation done as modern art. Many times the main purpose of the visualisation is lost, the chart is now beautiful but means nothing to the audience other than an appealing visual to your eye.The chart above shows us that we can use the tinniest dimension such as colour and create art while also telling our story. The chart should catch the audience’s attention but must have the content to keep them wanting to know the story.

Please do visit the site to get a better view of the images : http://databetic.com/?p=304

 

 

 

 

 

 

Washington – An expensive city to live in, really ?

One is planning to move to another city, the first question in anyone’s mind is the new city going to be more expensive compared to current one. Will my income suffice to the average expenditure which includes housing, utilities, transportation, taxes etc? Recently, when I was looking for datasets and articles for my group project on which city is best place to live, I came across this article: Study says Washington is expensive than New York. The article made me wondering is that really true that Washington is expensive than metro cities such as New York and San Francisco. And most interesting about the article is the visualization used to draw the conclusion.

Here is the visualization:

The above visualization is a simple bar graph that shows average annual expenditures on various household items of selected cities.
Y axis – Selected cities
X axis – Categories of Household items & those are 1. Furnishing & Equipment 2. Housekeeping Supplies 3. Household Operations 4. Utilities 5. Housing

The best part about visualization is that it’s so simple. It shows the expenses of various categories with respect to the cities. For anyone who looks at the graph, it’s easy to come to the conclusion that Yes, Washington is expensive compared to all other major cities. Whichever city that has the biggest bar is the expensive one. But is that actually true? Does this visualization do justice to need or answer to the question i.e. Which is the expensive city to live in? And, I don’t think so.

Firstly, what is expensive? How you define your needs? If the income is high and people can afford to spend, does that make a city expensive? This logic goes with the above visualization. People in Washington have high income and spends major part in housing, but that doesn’t imply that Washington is expensive to live in. And also, above graph just tells us what people are spending on. What people are spending is no way correlated to an expensive city.

Secondly, I think data collected wasn’t enough to answer the need ( Which is the expensive city? ). Having or considering the data of expenditure on various household items can’t only be the determining factor in deciding which city is expensive. The data doesn’t give justification to the claim. It would have been better if the data was collected on following:
1. What is the various taxes of the selected cities?
2. What is the median household income?
3. What is the salary by profession or salary for the common profession?
4. What is the school & education cost?
5. What is the transportation cost?

After collecting data on above factors and many other ones, then it would be better to draw a visualization and draw a conclusion. Better would have been to compare and contrast the data on the above-mentioned factors. Comparing charts on various factors of various cities as shown here : SF versus NY helps in better understanding of which city is expensive.

This visualization made me understood how collecting limited data and how a simple graph could lead to a misleading conclusion. It is very important to define our needs correctly in correct context. Also, collecting enough data from multiple resources is also important. Validating the visualization to the question we want to answer is critical. It’s crucial to determine that have I drew correct visualization or not. Additionally, having evidence to support the claim makes it better visualization.

References:

Washington Post, Datausa.io

Beer Belly of the USA!

The visualization below is part of an article-cum-experiment – “Where Bars Outnumber Grocery Stores” authored by NATHAN YAU posted to a site – Data Underload.

DESCRIPTION:

Author built this map to verify a claim made in an older article claiming the central-western region of the country can be called the “beer belly of the country” since the bars outnumber the grocery stores in and around that area.

This map is made with a help of a two-category (number of bars and number of grocery stores) map picked up from the Google Places API. The nice thing about the Google Places API is that businesses are categorized and searchable. Pulling the count of bars or grocery stores in each area of the country is particularly easier. To build this visualization, for every 20 miles, the author searched within a 10-mile radius for bars and grocery stores and got the ratios.

Basically, the more bars, the darker the brown and the more grocery stores, the darker the green. And as per the older claim, it can really be seen that high bar concentration in Wisconsin, whereas the rest of the country has significantly more grocery stores.

Positives:

The need for the visualization is clear – to find out where in the country bars are outnumbering the grocery stores and verify that central-western region of the country is really the “beer belly of the country”.

Data picked is accurate since it checks out when compared to that of the older article which made the initial claim. (comparison below)

The calculation done for pulling out ratio is also quite clear and sensible. The factors considered are very precise while plotting in the map giving it a great clarity.

Negatives:

Although the need for the visualization is clear, the interpretation of the visualization will change with the audience. The conclusion can vary from – “people in the state with more bars must drink a lot” to “people just prefer bars over restaurants” to “bars serve food too” or “people in states with lesser bars just drink at home”. Theses variations arise because of insufficient factors being taken into consideration for this analysis.

First off, the definition of “beer belly” is place where alcohol consumption is maximum, but there is no consideration of that in data collected. There are so many factors which ideally should be taken into consideration to call a place the beer belly, like:

  • Alcohol production vs consumption
  • Pubs (only liquor) vs Restro-bars (serves food too)
  • Profits and growth of bars

The data considered is not enough to justify this claim. The accuracy of the data can also be questioned since we are completely depending on one data source – Google places API. Government records (considered of the highest accuracy) might not necessarily tally with these.

In conclusion, I think the visualization is not the problem, the process of getting to that map is. Data does not do justice to the needs.

Sources:

Article: http://flowingdata.com/2014/05/29/bars-versus-grocery-stores-around-the-world/

Author: http://flowingdata.com/about-nathan

Site: http://flowingdata.com/category/projects/data-underload/

 

Visualizing causes of death over age

According to a leading newspaper, out of the 56.4 million deaths worldwide in last decade, more than half (54%) were due to the health diseases. Heart disease and stroke are the world’s biggest killers, accounting for a combined 15 million deaths in 2015. These diseases have remained the leading causes of death globally in the last 15 years.

The visualization below shows statistics of people died between 2009 and 2014 with causes of death in terms of categories of diseases. The graph shows % of people died due to a cause at a particular age. They have also segregated the data based on gender and ethnicity. The Centers for Disease Control and Prevention classifies the different causes of death into 113 causes, which are grouped into 20 categories of disease and external causes for make it less complex.

Causes of death over age
Causes of death over age

What I liked:

The stacked area chart shows data based on gender and ethnicity like White, Asian Etc.  As shown in image below, when you click on the different color band/area on the graph, it displays the age and percentage of people affected by the disease.

Showing a specific disease by clicking on the area
Showing a specific disease by clicking on the area

This makes it easier to see the impact of a disease group as age progresses. The graph also has an option to move to next and previous section using buttons and back to normal graph using show all button. Filters of gender and ethnicity are well thought to provide insight for a targeted group.

What can be improved:

  1. As mentioned earlier, the chart show data from 2005 to 2014 but if we look at the chart and hover around, it is difficult to say what percentage of people died, at what age due to a specific disease group.
  2. The chart shows combined data between 2005 through 2014. An additional graph showing the trend over time period would be a good enhancement that can indicate which type of disease is causing more deaths over time.
  3. If we want to compare the cause of death between different ethnicity or gender is it not possible in this chart. To see the cause of death in men and women, a filter needs to be applied but if we want to compare the causes of death between men and women, it is not possible to do in given graph.

How to improve to derive better insights:

  1. Tooltip can be added to existing graph that shows exact percentage and age for a specific cause. This will help users who are not only looking at trends but seeking precise facts. This example of Texas Oil Rigs(Click Here)shows how tooltip can be used to extract precise information from a chart.
  2. Add functionality to compare trends between gender or ethnicity. This can be achieved either by adding a multi-select filter in the existing graph or creating additional graphs showing comparisons between male – female and between different ethnicities.
  3. As mentioned in the second point of above section, the trend over the period of time is not shown in the graph. It would be a good idea to add time series animation to see trends over a time which is inclusive of percentage, age and year. This example showing Wealth and Health of Nations (Click Here) shows how time series animation can supercharge analysis when two dimensions other than time are more important.

The stacked area chart being a good way to visualize given problem, looking from a different perspective, it can be improved in many ways as mentioned above to give better insights to a viewer. Sometimes, looking at the same data from different perspectives can expose hidden facts residing in data as we can improvise above visualization by adding trends over time.

References:

  1. https://flowingdata.com/2016/01/05/causes-of-death/
  2. Wealth and Health of Nations http://goo.gl/9nPEUC
  3. Using tool-tip https://public.tableau.com/en-us/s/gallery/texan-oil-rigs
  4. Using multi-select filters https://public.tableau.com/en-us/s/gallery/ice-melting (Years drop-down is multi-select)