The Beauty of Roses

This week I watched an old movie about Florence Nightingale. It was a really great movie, and I was blew away by how great Florence Nightingale is. She is best known as the lady of the lamp, the founder of modern nursing who cared for thousands of soldiers in appalling conditions during the Crimean war.

Later on, I found out that Florence Nightingale was also a superb statistician. In 1857, she created a revolutionary a controversial diagram, called rose diagram. It forced the British government to create better and cleaner hospitals.

This is the Nightingale Rose chart.

The charts illustrate from 1854-1856, the solder’s death in each month according to the cause of the death using different colored “rose petals”.

The message that the diagram delivered was potent and direct – hospitals can kill. It’s also fascinating that the diagram revealed that if the right improvement were made those mass deaths in the hospital could be avoided.

There are 4 benefits of Nightingale Rose chart : 1) The color is very eye catching, and the audience is willing to read more at the first sight 2) Each slice takes an equal sector of the circle, making labeling much cleaner; 3) Each slice still maintains an accurate area comparison with other slices (by making the radius of the slice equal to the square root of the value); and 4) Nightingale also put another contrasting rose chart to show that the death could be avoid with the right improvement.

Nightingale was the first to use a statistical graphic as a call to action. The diagram convinced the public that the epidemic disease could be controlled and that is the purpose of the graph. And force the British government to spend money on the sanitation.

This is exactly what we are trying to achieve in our data visualization class. The purpose of a diagram is trying to make a claim and creating value.

The Nightingale Rose chart illustrates that how powerful can a good visualization be. But it also occurred to me that the larger blue “rose petal”could be miss leading.  We cannot “cherry picking”on how we present the data. Representing raw data visually should reveal, not conceal.

Reference:

  1. Worth a thousand words

The Economist

https://www.pinterest.com/pin/128704501821544284/

  1. Did Nightingale’s ‘Rose Diagram’ save millions of lives?

http://www.florence-nightingale-avenging-angel.co.uk/?p=462

  1. Florence Nightingale — História da Enfermagem — O filme completo

https://www.youtube.com/watch?v=sYZnzt0CJtE

Visualization showcasing death rates from air pollution.

Dashboard – https://ourworldindata.org/

Description:

This area chart visualization presents the death rates across the world caused by air pollution from three sources namely indoor solid fuels, particulate matter, and ozone. The death rate numbers shown are per hundred thousand from 1990 to 2015 in steps of five years.

What I like about this dashboard:

  1. Area chart is effective for visualizing magnitudes of connected-series dataset as visible because of a filling between the line segments and the x-axis. So, a person can observe the change in growth effectively. This observation cannot be visualized so effectively in other visualization tools such as line graphs.
  2. This dashboard presents both the absolute and relative trend of death rate either in world or in any country.
  3. It is interactive in nature and includes a drop-down menu of several countries in the world. Either one can visualize the pattern of entire world or can also view the pattern in any single country by just select it from a drop-down menu.
  4. Another good feature it includes is that one can view the magnitude of death rate for any individual source of air pollution or in a combination of two or three. Such as visualizing death rate pattern only from ozone or in a combination of two sources such as particulate matter and solid fuels.
  5. It also gives the information of death rate in an absolute as well as in relative to the other regions also.

Cons of visualization tool used:

  1. Data in one segment is hidden behind the data in another segment. When I visualized the death rate pattern only from air pollution from ozone, the width of the green area was thicker as compared to its width when all the three areas are enabled. This is undesirable as it does not convey the actual trend.
  2. Generally, to get a value for a point on a curve, we look at its Y coordinate. However, in this case, to get a value, we need to subtract the upper and the lower Y coordinates of a point on an area. This makes it difficult to visualize the relative values of the three areas in first go.
  3. For the absolute trend, as one moves from left to right in the chart, the y coordinate of the green area (death from ozone) decreases which gives an impression that the actual value of the deaths from ozone is decreasing from 1990 to 2015. However, the deaths from ozone do not change over the years as reflected by the (almost) not-changing width of the green area. This is highly confusing. The confusion also arises from the fact that the green area is very thin making it appear somewhat similar to a line curve. Hence, change in y-coordinate gives an impression of change in magnitude.
  4. The shape of the entire chart (group of three areas) depends on the order in which the three areas (green, red, blue) are stacked vertically. Hence, a change in order will significantly alter the shape of the chart. For example, if the green area which is almost constant in width is kept at the bottom, the entire chart will look more stable. This is undesirable because visualizations for same data should look similar.

Let’s move on to the critical analysis:

  1. Does this visualization carry any goal, does it have any purpose?I believe that the visualization severely lacks a purpose and its goals are quite unclear. Hence, I would not categorize it as enlightening.
  2. Considering the domain, two things came in my mind, audiences and the needs. 

Audience – I am unable to identify the target audience for this visualization. If it is intended for the worldwide social or environmental agencies, I do not think that the information provided is sufficient to fulfill their needs or can help them to decrease the level of air pollution caused by any of the sources.

Let’s take an example:

Knowing the trend of deaths rate from particulate matter over the years does not solve any purpose unless it provides further information about the types/categories of the particulate matter causing deaths such as if they are man-made or natural or both, and their respective proportions in the deaths. There can be various types of particulates like the ones resulting from dust storms, volcanic eruptions, or chemicals such as oxides, nitric acids, etc. Similarly, nearly half of the world’s population still relies on burning solid fuels such as wood, animal dung, crop residue and coal for their day-to-day household needs. Therefore to get a better picture and to know the root cause of air pollution, further details providing death trends from these sub-types of particulate matter should have been included. Hence, I feel that the visualization is not insightful.

  1. Claim: This visualization does not showcase any claim, either about any particular air pollution source or any region most adversely affected by any kind of air pollution. As there is no claim, there is no warrant that provides any reasoning behind the arguments.
  2. Rebuttal: The viewers cannot throw any counter argument as there are no arguments presented in the visualization. If the designer of this visualization thinks that this dashboard is sufficient to work on the death rate numbers for any health or environmental organizations, my rebuttal would be that no this not sufficient as evident from the points listed in this blog post.
  3. The data is in the scale difference of five years, so one cannot get the actual information about the condition in intermediate years. Subjects like death rates require continuous data to analyze the situation across years.
  4. The authors have completely missed the connection between sources of air pollution and death. And, that connection is a “disease”, which is caused by air pollution. Air pollution leads to death of a person through a disease. People just do not die by inhaling harmful particles. Air pollution caused by any of the listed sources can result into a lung failure, heart problems, etc. Hence, I do not find the visualization numbers “convincing”. 

What could have done better –

  1. Use of multiple sources of data: Death rate is a very sensitive subject so designing a visualization from only one data source makes it less effective as compared to the visualization designed from using multiple data sources which includes root causes, effects and continuous information from 1990 to 2015. The continuous data would also help to make any prediction in coming years, which cannot be made currently.
  2. This visualization does not give any comparative analysis of the effect of air pollution in various regions. So, Bar graphs could have been used for this purpose.

Below are the links showcasing some similar visualizations in air pollution. I would not say that these visualizations are a perfect substitute or they address every weakness raised above, but it appears that they carry a purpose and can be helpful for fulfilling the goal of their audiences.

Redesign:

As this visualization does not carry any specific goal and any specific actions to meet that goal, it cannot be categorized in the category of visual confirmation. It can fit in a visual exploration quadrant though. I came upon some useful visualizations from the data provided in this chart.

Comparing similar visualizations:

  1. http://www.scoop.it/t/classroom-geography/p/4018472031/2014/03/27/infographic-deadly-air-pollution-where-and-how
  2. http://www.wri.org.cn/en/node/41165
  3. You can view my redesigned part here – https://docs.google.com/a/scu.edu/document/d/1X1XZyh1MgFW0B3VsvxKV1aehY2M391skNTudTDfAKg8/edit?usp=sharing
  4. Tableau public work: https://us-east-1.online.tableau.com/#/site/magarwalscuedu/workbooks/46729/views

 

 

THE US TUITION INCREASE

Once upon a time in America, students paid for college with the money they made from their summer jobs. Then over the course of the next few decades, public funding for higher education was slashed. These radical cuts forced universities to raise tuition year after year, which in turn forced the millennial generation to take on crushing educational debt loads, and everyone lived unhappily ever after.

From January 2006 to July 2016, the Consumer Price Index for college tuition and fees increased 63 percent, compared with an increase of 21 percent for all items. Competition is one reason. As schools wanted to attract top-tier students, the costs of hiring brand-name faculty members, building expensive facilities, and offering comfortable student amenities all add up. All these factors combined produce headache-inducing tuition rates at both private and public universities.

The following visualization shows the average in state tuition and fees for one year full time study at a public four-year institution from 2005/06 to 2015/16 for different American states.

Best way to analyze data is through data visualizations. Data visualization turns numbers and letters into aesthetically pleasing visuals, making it easy to recognize patterns and find exceptions.

We understand and retain information better when we can visualize our data. With our decreasing attention span, and because we are constantly exposed to information, it is crucial that we convey our message in a quick and visual way. Patterns or insights may go unnoticed in a data spreadsheet. But if we put the same information on a chart, the insights become obvious.

So  what’s good about this visualization?

  • The dashboard incorporates a significant amount of data, making it easy to compare and convey the matrix in the context.
  • The dashboard has a flow structure which effectively incorporates user to view data based on time scales such as year and identify the trend year after year.

What can be changed

  • Too much data, too close together – this dashboard doesn’t have enough room to breathe, giving users data overload. It’s also poorly structured, making it extremely difficult to interpret what information the chart is displaying, especially at a glance.
  • Confusing colors – The plain background is quite helpful as it makes the visualizations stand out, however the subtle variation in shade actually makes it more difficult to differentiate between the lines for each of the cities.
  • Make visualizations clear and precise. It is not a good idea to include all the information in a single visualization which cannot be digested easily doesn’t solve our purpose. So, it’s better to enable drill downs to navigate to more detailed information from the main visualization.

How to fix it:

  • Use of vibrant and distinct colors – colors such as green, blue, yellow, or red could be used to indicate different ranges of percentage increase.
  • Add options to drill down – drill down option could be used to represent the point where the state reached the percentage value 75%,50% or 25%.

Dashboard could be broken down to multiple dashboards to include states region wise. This makes the data easier to read and digest.

References:

 

https://www.geckoboard.com/blog/5-terrible-dashboard-designs-and-how-to-fix-them/#.WQPdh4jyu00

 

Aliens among us

For this blog post I picked up this Info graphic because I found it to be somewhat interesting. It doesn’t really have the concrete claim (like aliens are among us or aliens are not among us), but rather tries to inform us on what are people’s opinions on the matter.  Is this a good visualization? I think so, because topic itself is a bit hard to think seriously about it is good to have info graphic that has a sense of humor!

PROS:

  1. Visualization has to support the claim, since the claim is such that it just educates people on people’s opinions rather than claiming that aliens are indeed living among us I think it does support the claim very well.
  2. It looks visually appealing, the graphs and indicators are self-explanatory and don’t need any extra legend. Although some gaps could have been chosen better but I will mention it in cons section.
  3. Sources and claim are included in the graph itself so it needs no article to go along with it. In addition because graph has claim written on it is virtually impossible for someone to misuse the graph to prove their claim (unless their claim is same as the graphs).
  4. Believers and Skeptics showed on the map as well as a separate list in declining order. I think map visualization gives an interesting perspective on how neighboring countries have very different beliefs. For example Canada and Mexico are listed as non-believers while USA listed as believer.

CONS:

My main concerns are with data itself and how it was acquired, numbers are show no indication of how survey was taken and what the sample size was.

  1. One in the five people believes that there are aliens living among us. How exactly did they calculated it? Is it calculated across all of the surveyed countries or is it just based on US numbers. This is unclear and given the huge gap between believer counties and non-believer countries should be taken with the grain of salt.
  2. What was the exact question and response options? I have a hard time believing that 20% of people think there are aliens living among us. For example my grandmother saw something that looked like a spaceship long time ago but I know she never believed there were aliens on earth. Me personally; I do believe there is life somewhere in the universe, but I don’t think there are aliens living on earth. And defiantly not among us.
  3. The number of believers compared to age also seams iffy, it would be nice to know the sample size of each group. Also it seems a bit not logical how number of believers drops with age, if you believe in something that is hard to disprove why would you suddenly stop believing? Although I did find another article did show correlation saying that older man did believe less in alien’s existence compared to younger men.
  4. Pie charts on the info graph a little bit harder to read and compare, I think this graph should have been a bar graph. Bar graphs are much easier to compare to each other especially when differences in numbers are not very big.
  5. Split between believers and non-believers is not well defined 21% in Spain is not too far form 16% in UK.

http://www.newsweek.com/most-people-believe-intelligent-aliens-exist-377965

Nintendo Sales Trend Graph

This graphic is a line graph that is designed to show that Nintendo’s hardware sales have had a negative trend from 1998 to 2006 in both it’s home consoles and its handheld gaming systems. The graph shows that on the home console side, Nintendo had declining to flat sales from 1998 to 2006, with sales never moving past 10 million consoles sold. While sales spiked from 2006 to 2008, sales dropped very quickly afterwards. The graph also shows that while Nintendo had more success in selling handheld systems (consistently sold more than home consoles), even these sales saw a sharp decline from 2009 to 2016. The graph is part of an article that is making the argument that Nintendo’s future in many ways depended on whether or not the sales of its new console (the Switch) could reverse the poor sales trend. The article uses the graph to imply that if the Switch sells poorly, then Nintendo might never be able to reverse the trend.

There are several things that the graph does well. First, it starts the y-axis (sales) at 0, which helps keep it more accurate. Second, the graph is very clear and easy to understand. The two colors are easy to differentiate, and the legend and labels make it easy to understand how many systems were sold in which year. The use of alternating colors for each year also does a nice job at giving the graph a clean look without being too boring to look at. The graph does a good job documenting it’s sources (Nintendo and Statista), and specifies to the audience that its time frame is in fiscal years. Finally, the graph is extremely functional at showing it’s main objective: that Nintendo’s sales have been in decline. Regardless of which audience is looking at the graph, it should be clear to anyone that Nintendo has been in trouble, and that it needs to sell really well really soon.

That being said, there are several things that could improve the graph. First, while the graph does a good job at conveying basic information, it doesn’t do a great job at showing potentially why Nintendo has struggled. Because of this, it is a bit difficult to draw solid conclusions from the graph.

I feel that this graph would benefit by listing important events, such as when Nintendo released different consoles during this period. For example, the graph could somehow, whether on the graph, with dots and a legend, or a timeline below, mention that Nintendo released its GameCube system in 2001 (the period with the flattest home console sales, in part due to increased competition), and released the Wii system in 2006 (which caused the sudden increase and decrease in sales). Adding these events would give the reader a better picture of Nintendo’s struggles. Without knowing any of this, one might be confused as to why Nintendo’s sales have been down, or one might think that Nintendo’s lack of sales is because the company hasn’t released any new systems. The graph could also benefit if it showed the sales trend lines for Nintendo’s competitors, the Playstation and Xbox lines. Without these comparisons, the audience might not get the intended conclusions. For example, Nintendo selling a combined 25 million systems  between 2002-2003 might not sound bad on it’s own. However, if we saw that Sony had sold 40 million PlayStation 2’s during the same time period, then the audience would really get a sense of how much trouble Nintendo was at that time. Another thing that might help audiences is if it was mentioned somewhere that handheld systems are less expensive than home console systems. Again, this would help prevent audiences from thinking that Nintendo’s high sales in handheld systems was offsetting Nintendo’s troubles in home console sales.


http://static1.businessinsider.com/image/58790081ee14b6c7148b7fe9-1200/20170113nintendohardware.png

http://www.businessinsider.com/nintendo-console-sales-chart-switch-2017-1

 

Did you ever know this?

After writing my first blog and having discussions over Visualization for 4 weeks, I can certainly see a change of perception towards Data/Information. This deep thinking invoked me to search for many numbers of charts/reports/visualizations available all over the internet.What I exactly wanted to achieve is to find something with the help of which I can understand all the terminologies/principles/ properties and recommendations we learned from Professor Schermann. As I am always very much excited to know about new things or may be unexplored facts, this report became my end of the search and I am sure that many of you would be wondering that which country is at the top for what. So, here you go!!

http://www.informationisbeautiful.net/visualizations/because-every-country-is-the-best-at-something/

This chart is research of David McCandless along with Stephanie Smith and Esther Kersley. The original version of the research came out in 2009 which is also available on the link above. The data from which the research is derived is also attached and the spreadsheet shows all underlying information.

The author claims to represent all the countries and what they are best at. They all are divided into 9 categories altogether which includes commodity, psychology, ecology, gastronomy, economy, Nicety, humanity, technology, nasty. It makes this chart interesting enough to immediately connect with the audience. for example, I suddenly wanted to check out category humanity, especially for female entrepreneurs and Zambia won my heart. The backbone of the report is its documentation attached. We have discussed that validation is very important and documentation is something which gives us the base to rely on information. Below are some terms which we discussed in class and could be easily understood with this chart.

1. Claim: According to data, every country is the best at something. This claim is derived mostly from the data available for top ten countries in *.

https://docs.google.com/spreadsheets/d/11uifsxtHKwRysrxNxTDhvWLDHTlxQ0jYP8PODoLM2hM/edit#gid=1130095511

2. Warrant:  It is the relation you derive from data to explain your claim. Example- Australia is at the top in the world of cyber security incident and report comes from Pwc which compares the number of such incidents among countries and Australia has 9,434(highest) such incidents in comparison to other countries.

http://www.pwc.com.au/press-room/2015/cyber-security-risks-oct15.html

3. Backing: Backing supports your warrant and to validate this point I went on checking some news published on the report. An insight to all of us could be that Croatia is number one in kidney transplant the same was verified by NCBI. There is a sufficient growth in organ transplant that supports the argument.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3610255/

4. Rebuttal: It is that counter indication which makes the claim dubious. A rebuttal here can be seen from an example of Child Bride which is claimed to be the highest in  Niger. There are six countries that do not specify a minimum age for marriage and it could possibly impact this claim. There might be no data available for these countries in this regard.

https://www.weforum.org/agenda/2016/09/these-are-the-countries-where-child-marriage-is-legal/

5.Where the claim qualifies: David reveals all of his data sources and accepted that for some countries there is no data available. Hence the claim qualifies only for participant countries and for all of these nine categories. Anything beyond which they cover could be another surprise.

Though the chart is not explaining the facts but it is very insightful. It might be very enlightening for commodity business because now they can know which country sells the cheapest Nike and Botswana is the best place to get Diamonds from. Otherwise, my wild guess would definitely be China for Diamonds. This chart is truthful as most of the facts include the information for those derivations. For example of validation, World Atlas also claims that worst country for child labor is Eritrea.

http://www.worldatlas.com/articles/worst-countries-for-child-labor.html

The last and the most crucial part is to understand the domain. Context is such powerful tool that it could turn things around. A good example is this chart showing Singapore at the top for having healthiest people. Data is driven from Bloomberg rankings and ranking is based on factors like birth, mortality rates and cause of deaths. Whereas USA today claims the same but differs in its listing putting Qatar at the top. The debate is not for who is right instead of context because 24/7 Wall St’s ranking is based on factors widely categorized as health indicators, access measures, or the economy. Here you see how the context can bring out a different picture, so be aware of your domain while claiming!

References:

https://www.usatoday.com/story/money/2015/04/03/24-7-wall-st-healthiest-countries/70859728/

and all of the hyperlinks above.

People Don’t See Social Media as an ‘Important’ News Source

“Social Media is not about the exploitation to technology but service to community”

In times like these where the world is constantly changing, keeping oneself up-to-date with current news and affairs is becoming increasingly important. Though not everything that happens around the globe has a significant immediate impact on each one of us, being aware and well informed surely holds an excellent value.

While surfing through Facebook, Twitter or say LinkedIn, I often stumble upon some very interesting news feeds, articles, blog posts. I personally must agree that I spend more time on these sites than on any of the official new websites. Hence, for me, social media happens to a chief source of news. It is really thought provoking as to how social media brings together news, trends and best practices from various parts of the world. Facebook and Twitter also provide platforms that host live videos and real time updates.

Having said all this, recently I came across an article which left me wondering. The article –  People Don’t See Social Media as an ‘Important’ News Source claimed that one in ten US adults get news on Twitter and four in ten get news on Facebook. It had a couple of pie charts which showed that 17 % of US adults use Twitter and 10% get news from Twitter. On the similar lines, it showed that 66% of the US adults use Facebook and 41% get news on Facebook. Reading through the entire article, I learnt that it also provided some more information like the importance level of these sites and how do the younger generation perceive them.

Here’s my analysis over the visualization and the data presented in this article.

What did I like?

  • The caption of the figure itself summarizes the findings the pie charts want to convey.
  • By putting the numbers in a scale of 10, we can easily interpret what role Facebook and Twitter play in conveying news to the adults in the US.
  • It also gives a quick comparison of the popularity of two of the biggest platforms which social media offers today and how adults contemplate them.

What more could it include?

Audience: Giving a thought on which group of users would find this information useful, it would be perhaps website hosts (Facebook and Twitter), news networks (having their official pages and accounts on FB & Twitter) and lastly the general curious public (like me!). As professor has been mentioning in class, every claim must promote some action. This chart is not actionable as it does not really provide much details as to what action each of these audience categories could take for their benefit. If it provided some details on the specific new channel accounts/pages are being followed/liked by those 10% of users(Twitter) and 41% of users(Facebook), it would then help in understanding the demand of various news networks. It enables a news network to take appropriate decisions to improve their visibility. It would also give Facebook and Twitter an opportunity to work on their algorithms of recommendations/suggestions for people.

Important Level: The table included below the graph depicts 3 levels of importance cited for FB and Twitter as a source of news – most important, important, not very important. If this information was included in the pie chart itself, the chart would have been more descriptive. Apart from that, the math being done seems to be incorrect because the percentage breakdown for Twitter users exceeds 100.

Age group: The chart only focuses on the US adults. However, the article also mentions that the younger Facebook and Twitter users tend to see the services differently than their older counterparts. As a viewer, I would also want to see the statistics for the younger generation since they have a comparatively better hold on technology. Hence it makes me ponder if the article title really holds true?

Other sources: The author could have also included data regarding what other sources are people considering to catch up with the news, if not Facebook or Twitter. Hence to me the chart is not completely functional.

Aesthetics: Using bright relevant colors and a bigger font would help reach out to a larger audience. The colors being used also plays a role in engaging the viewers.

Better Design – The author could have also furnished more insights of their research by doing something like this:

http://www.journalism.org/2013/11/14/news-use-across-social-media-platforms/5_profile-of-the-social-media-news-consumer/

The above visualization released 2 years back has a similar domain and is way more descriptive as it also categorizes the audience, news sources and social networks into different classes.

Redesign –  In the below link, I have tried to redesign the visualization to present my ideas based on the data the author must have had post the research.

https://drive.google.com/a/scu.edu/file/d/0Bz_BJfR_3JJDVGhaTEJEUjM0Qm8/view?usp=sharing

Conclusion: The article does provide some very interesting insights. But the charts included did not seem to be complete and are less instructive. Working more around the charts and the visualizations would empower the audience to act in a specific direction!

References:

https://www.gooddata.com/blog/5-data-visualization-best-practices

https://www.digitalready.org.au/training/social-media/why-have-an-online-presence/the-importance-of-social-networking

 

 

Power of Simplicity!

https://www.clickz.com/wp-content/uploads/sites/2/2016/06/munster.jpg

 

Often than less we get euphoric while viewing dashboards of today with fancy editing and what not thus overseeing the true meaning that some dashboards tend to portray due to their simplicity. All through our childhood we are taught history as a subject to learn from the past and here we have an old visualization, simple yet powerful, depicting how a few images can change our thinking for better.

Munster, a town in Germany, produced this visualization back in 1991 to encourage bus use. It beautifully shows impact of same number of people (72) on bicycles, cars and a bus and the relative space that each occupies on a road.

Traffic related issues are growing day by day as the number of cars are increasing at a staggering pace. The day is not far when we’d run out of roads only to be succumbed by the daily traffic jams. This issue is not just for the future but a lot of cities like Delhi are currently facing an uphill task to overcome this menace.

What I really love about this visualization is how quickly in a single glance you get the message loud and clear. It is certainly ahead of its time when there weren’t much of editors or applications that helped you build such epic dashboards.

One of the changes I could suggest for its time is for the middle one to show the complete picture of the impact of 72 cars on a road. Also, the message could have been portrayed with a greater depth by creating a series of pictures with more number of people in each series than the previous one to scale the problem at an increasing pace because the space taken by two buses is nothing compared to the space taken up by 150 cars.

Despite such small shortcomings, this visualization enlightens us in many ways than one can think of. To conclude, a dashboard need not be fancy to portray something simple yet meaningful.

Highschool graduation rate in USA

Twitter is a popular platform used by governments and leaders to communicate with public. Since the character limit for a tweet is limited, it makes sense to convey the intended information through charts rather than boring wordy reports with statistics. Here is one such tweet from Dec 2015 from WhiteHouse

Audience and intent – This chart is intended for public who is interested and constantly evaluating government’s performance. The Whitehouse wants to convey that the high school graduation rate is the highest in 2015 than it has ever been and implicitly highlight this as an achievement of the government.

Is the chart meeting the purpose?  To a certain extent and to people with not so keen eye for detail, yes – this chart serves the purpose. However, the chart does not represent high school graduation rate data in its entirety and is subject to speculations.

Critique

  • The type of chart used is vague. A column-like chart is represented using books with a 3D effect. 5 books represent 75% and 16 books represent 82% which is quite absurd.
  • The graduation rate is represented as a percentage. A percentage of what? I am assuming that it is relative to the number of students enrolled in 12th grade.
  • Thinking further, I would want to know if there is any change in number enrolled for 12th grade. I am assuming the proportion of high school aged section in a given population does not change drastically over the years,  so ideally as the population grows, the number of people/kids in high-school-age-group also increases. If there is no increase in the number enrolled with passage of years the chart seems to be misleading.
  • Also, this chart does not give out any information regarding the drop outs. For example, a school has 120 students for 11th grade out of which 20 dropped out. 99 students out of the 100 who were promoted to 12th grade passed the exams may imply the high school graduation percentage is 99% (99/100)  or 82.5% (99/120).

Betterment – In chart-making, choosing the appropriate form to represent the data on hand is of utmost importance. Ideally a line chart is suitable to show subtle changes in rates over time. However for the high school graduation rates we have different parameters involved. I would like to see in a given year the number of people between the years 17 to 21 years and the percentage of them with high school diploma. To represent these details, I would use a bar graph. Y axis represents the population number scale (number of people between 17 – 21 years) and X axis represents year. Each bar is stacked, i.e divided into 2 stacks with different colors, each color stack representing the number of people with high school diploma and without high school diploma respectively.

 

References:

Washington Post – Highschool graduation rate hits an all time high

Whitehouse archives – More students are graduating than ever

Where Do EPL Players Come From?

In sports a team’s goal is to be successful. As with many sports in soccer success is winning as many games as possible to make it to post season and eventually win the championship.

How does an organization build a championship winning team? There are a lot of factors that can make an impact and data can help influence recruiting decisions.

Putting ourselves in the shoes of a recruiter, we’re looking to put together a new star team in the EPL(English Premier League). One of the best ways to learn is from looking at history. We are lucky enough to have data on the players currently in the league.

The objective is to find what would make an ideal recruit for our team (data only).  We want to find the optimal player profile that will help us have a successful season.

Location is important and this dashboard can tell us a lot about where to look for players:

https://www.tableau.com/solutions/workbook/create-optimal-game-strategies-based-past-results

according to the dashboard:

  • Most players have spent time in other countries, but most have spent >50k days playing in Europe, conclusion is that there is experience in European leagues or the EPL
  • EPL recruitment is concentrated in EU, but also pulls from other countries WW
  • When looking at club breakdown, most patterns look similar with a cluster in EU and variance in the outliers (players from the US, Spain, South America). It is hard to correlate to current standings.
  • Birth country shows that the EU isn’t the only location dominating the top – players are born in Senegal, Brazil, Argentina, and Nigeria (even though no players were directly recruited from Africa)
  • The author digs further into Africa, revealing that a significant number of players born in Africa play in the EPL, regardless of their recruiting or development country, highlighting that people from Africa often come to Europe from France are developed and are recruited from Europe

Based on common trends it is pretty conclusive that focus would be on recruiting players that have done development work EU, but with South American, African, or European backgrounds.

This dashboard covers location, but it is not enough to tell us what makes the perfect player.

It does not include individual player performance or what combination of these two builds success (if someone figured that out, recruiters wouldn’t be needed).  The NCAA makes some recommendations on what body fat and other characteristics a higher level soccer player should have (among other sports):

http://www.ncaa.org/health-and-safety/sport-science-institute/body-composition-what-are-athletes-made

Pros

  • Many different types of visuals are used
  • The “story” aspect of Tableau is used to direct attention to the point the author is trying to make
  • A lot of good data on what teams are already doing
  • Compares different recruiting patterns, but doesn’t show how that impacts
  • The visuals are all straightforward. It is not confusing to understand any page of the dashboard.

Cons

  • Doesn’t provide the ability to necessarily explore options outside of what people are already doing, an observation of what current teams are already doing (IE what happens if I expand to recruiting in Antarctica)
  • Can’t focus in on a specific team, don’t have enough info to use the data to see how expanding to new markets for recruiting has impacted performance (IE Everton recruiting in South America vs AFC Bournemouth staying closer to EU)
  • Don’t put rankings, team performance

How would I change it?

  • Presuming the goal is to provide insight that one can take action on, more data points need to be added in addition to location. Location is only an observation, adding player characteristics, team characteristics (resources), and performance can add more background and context to explain patterns and correlation about the players/teams and performance over time that could be made into recommendations.

How else could I use this data?

  • If I am looking to someday become a player in the EPL, based on geography I could make strategic decisions on where to play (or to do development) in order to increase my chances on getting into the league (purely based on geographic indicators from this data)
  • If significant correlations are seen between players from (born) in certain countries and performance in the EPL, development investment could be made in those countries (IE if players born in Africa are high performing, how do we optimize)