NBA Player Statistics

Link to the dashboard: http://mikerazar.com/chart-it/2015/01/14/nba-player-statistics-dashboard/

Background

The author uses three dashboards to compare NBA players based on their statistics including offensive metrics, defensive metrics and other metrics. According to his explanation, for all graphs, x-axis represents the player’s career seasons. Y-axis represents each NBA statistics, such as field goal percentage and points per game.

What I like:

This dashboard gives a clear vision of data by using line chart to show the differences between players. For example, in the free throw percentage dashboard of offensive metrics, we can clearly see that Kobe Bryant and Michael Jordan have the free throw shooting accuracy compared to LeBron James.

What I don’t like:

First, I think it’s not a good idea for the author to set the x-axis as the player’s career seasons. The author should think about what his audience is going to be, NBA teams or the public. The setup will be fine if the audience is the public. They will be able to who is the best player based on their lifetime value. But if the audience is NBA teams, it will be more valuable if the author can change the x-axis into regular season, such as Season 2014-2015 or Season 1997-1998. The reason for that is because the game intensity is different in the past and now. Kobe Bryant may face a more tough defender than Michael Jordan in the past. So the information will be very limited for team trading purpose.

Second, as I mentioned previously, the game intensity in each period is very different. Also, some players may focus on offense more and some players may focus on defense more. So it will be hard for the audience to compare the players based on some many charts. In order words, when the audience want to determine a top offensive player, which graph should he pick, points per game or assists per game? So a better way will be to give a total score to each player and come out a final graph. Namely, the author can set a coefficient value of each statistic based on its importance. For example, set a total offensive score for each player. It will let the audience make a pick easily.

Conclusion

This is a good dashboard, which represents its data clearly towards the audience. However, the author should think about more on who his audience is and what will be a more efficient way to present the data.

Reference:

http://mikerazar.com/chart-it/2015/01/14/nba-player-statistics-dashboard/

http://scholarship.claremont.edu/cgi/viewcontent.cgi?article=2302&context=cmc_theses

The Beautiful Game!

The Beautiful Game!

Football (soccer) is the most popular sport in the world wherein FIFA has more countries as its members (211) than even the UN (193). This goes to prove how sports has managed to unite people more than anything else. Moreover, the FIFA World Cup is the biggest event of the world with over 3.2 billion viewers worldwide. Now moving on to the biggest debate in the football universe- who is the best? Messi or Ronaldo. Let’s hope we can find out by the end of this project.

Just like any other football fan, the question would always get me perplexed as to who should be on the top. The last nine Ballon D’ors have been won by either of the two with Messi winning five and Ronaldo four. Many believe both to be on the same level and it can be seen by the phenomenal records that they have shattered over the years. Still the curiosity inside me wanted to find the slightest of differences to adjudge one over the other. In my search for the answer I came across an abundance of data that had to be filtered for me to get the desired results. One of the biggest challenges that I had to face was selecting my audience. People who’d want to know this answer includes fans, managers, journalists, owners and media. World Soccer, fourfourtwo and when Saturday comes are the most popular football magazines in the world and would be a good medium to spread the claim of this project. These magazines are generally read by people who know quite a lot about football which makes them the perfect audience for this dashboard.

The claim that wish to prove with the help of these dashboards is how Messi is the best player of the last decade. First comparing the top ten players and then moving over to Messi VS Ronaldo. Comparing any two things is easier said than done. It must be based on multiple dimensions to strengthen our claim. Goals, assists, trophies, awards and salary felt like the natural choice when comparing them. Being a football fanatic myself, it was hard for me to stay unbiased for as long as I could until my worksheets started telling a story which was aligned with my claim.

 

Dashboard 1- Top ten players of the last decade

https://drive.google.com/open?id=0B6xsR3HojSRgSnNoQmkzUFBCa3M

Before comparing Messi with Ronaldo, it is important to first look at the top ten players of the last decade. This will show us why Messi and Ronaldo are above the rest and only then a comparison of the two is justifiable. Clearly this dashboard gives you a lot in a single glimpse.

On the top, we have statistics for all the players in a drop-down list manner which include overall rating, vision, curve etc. Bottom left corner has a soccer field in the background with images of players on top of it. Behind the images are pie charts that depict the percentage of matches won by each of them to show how much their teams win when they are playing. Next to it is a sheet with bubbles to show the overall ratings along with the number of goals they have scored in their careers. The last sheet is a world map with faces of players on the countries they are from.

Overall one can get a good impression of this comparison between the top ten players with the countries they are from, goals they have scored, their winnability and all major skill statistics. Now we shall have a look at each of these sheets in depth.

 

1.Player Stats:

https://drive.google.com/open?id=0B6xsR3HojSRga1ZQenUxUGRRRGM

This sheet alone speaks about a wide variety of skills for each player. Drop-down list felt like a perfect choice for this kind of data since the number of skills are over 20 and that too vary for each player. One can choose the player they want to view to skills of and the dashboard will change accordingly. Data for this sheet was picked from FIFA 17 game website since only they provide the figures for these skills. I could not find a data set that I could use python on so had to select each player and write his stats in an excel sheet. Some stats might differ from those at FIFA website as I have modified some of them depending on how good a season they have had.

Critique- First I used different colors for each stat but it did not look appealing to the eye so I changed it to different shades of the same color.

2.Goals vs Rating-

https://drive.google.com/open?id=0B6xsR3HojSRgOXMzVlFTeFdsajA

This sheet is useful as it helps us differentiate among the players. Size of the bubble depends on the number of career goals that each have scored along with their overall ratings. We can see Ronaldo and Messi with the biggest bubbles when compared with others. Ibrahimović too has a bigger bubble but he has not won in terms of individual or team championships with respect to Messi or Ronaldo.

3.World map

https://drive.google.com/open?id=0B6xsR3HojSRgU1VQU2t2bVhIdkE

We can see how each of our top ten players come from a different nation with most of them from Europe or South America. These two continents have dominated football ever since its global existence. Picture of a player above his country tells you which country they represent.

Critique- Major downside in this sheet is the overlapping of player images on top of each other since the countries they represent are smaller in size and closely packed in clusters.

  1. Winnability-

https://drive.google.com/open?id=0B6xsR3HojSRgVklUb05ua1UzLWs

This is certainly a beautiful sheet with a background image of a football field and player images on top of pie charts. Putting a background image was an easy task but to put images on top of pie charts required patience after numerous failed attempts. The pie charts represent the percent of matches that they have won, drawn or lost in their club career since it is the club that pays them money. Winnability speaks a lot about the ability of a player to win matches for his club and just how good their team performs when they play.

Dataset for this sheet were nowhere to be found and so I had to track back to each of their careers individually and find these values since 2007. I chose the formation of 3-4-3 as it is the formation of the world eleven and has been successfully adopted this season by Chelsea, arsenal, Barcelona and Juventus.

 

Dashboard 2: Messi vs Ronaldo (Goals and assists)-

https://drive.google.com/open?id=0B6xsR3HojSRgUWNIZEJjNWgwYlk

The dashboard gives a comparison between goals and assists provided by Messi and Ronaldo. In the goals sheet both of them are pretty close to each other with Messi scoring a tad more than his rival since 2009. As we go to the assists sheet we see a huge gap between them where Messi’s line graph never goes below Ronaldo’s. Data for this dashboard was found relatively easy and needed just some cleaning with python.

Showing some attribute with respect to time works better with line graph than with bar. One can clearly see the change in each attribute as time increases. Clearly Messi is our winner for this round by being the top goal scorer and playmaker for the past decade.

What these two are achieving year after year is unbelievable. What is even more remarkable is that Messi is two years younger to Ronaldo which means he could potentially play longer than Ronaldo.

 

Dashboard 3: Messi vs Ronaldo (Trophies and Awards)-

https://drive.google.com/open?id=0B6xsR3HojSRgNjZZYXhJejUzRU0

Let us now look at the most prized possessions in football which are trophies and awards. Both have a trophy cabinet that might just be bigger than most clubs in Europe.  Both are born winners and such a rivalry may never be seen again. In the first sheet, we have the championships that both won with their clubs or country and under it we see the individual prizes.

Clearly Messi has won more trophies than Ronaldo even though Ronaldo played some part of his career in England, winning the premier league thrice with Manchester united. If we have a look at Spain, Messi has won fourteen more tournaments. Even in club footballs biggest competition, the UEFA champions league Messi prevails. Ronaldo won the EURO last year with Portugal wherein Messi’s Argentina twice lost the final of Copa America to Chile. Nevertheless, Messi has an Olympic Gold to his name along with FIFA U-20 World Cup.

Messi also has more individual trophies to his name with five Ballon D’ors to his name, the most in the history of the game. Both have same number of European golden boots and Best player in Europe award as well. Spanish league (La Liga) is where Messi comfortably beats Ronaldo with five more best player in La Liga awards and three more top goal scorer in Copa Del Rey. But Messi also holds one award that even some of the greatest players ever might not have, the World Cup Golden Ball for the best player.

In these sheets, I first gave a background image and then I plotted each of those trophies as points on an X-Y plane over the image. Then replacing those points generated with images of trophies that had to be cropped on photoshop. It did require a lot of manual labor hours but in the end, it was worth it as the output generated is quite beautiful to look at.

Dashboard 4: Messi vs Ronaldo (Value for Money)-

https://drive.google.com/open?id=0B6xsR3HojSRgUWZIQ1ZsWE41eDA

Football like any other sport is a game of money. We should end the comparison by seeing how much did they both cost their clubs and who generated more value with their performance. Left hand side has two sheets depicting their salaries over time. Right hand side shows a line graph of weekly salary and bar graph of how much did they get per goal or assist.

More the gap between the line graph and bar graph more value that player has generated. Take the example of this season where FC Barcelona have had to pay less than 400k for each of Messi’s goals or assists whereas Real Madrid have had pay over half a million for each of Ronaldo’s goals or assists. The gap on Messi’s graph is more thus making him more valuable for the club. Values for the bar graph were created by dividing their salary by the total number of goals and assists in that season. The sheets were made by data from two different data sets where one had yearly salaries and the other had their weekly salaries. Ronaldo’s salary increasing every year from 2010 to 2013 does not mean he signed a new deal every year that his initial contract had a clause of his pay rising every year by some percent based on his performance.

In the end, we have seen Messi overcoming Ronaldo on all fronts, be it the number of trophies, awards, goals, assists or value for money. Even though four dashboards sound a lot but it must be seen how we did need all of them to prove our claim. Each of those sheets helped in strengthening our position on the claim. Based on this project, we can say that Messi is the best player of the last decade.

During the process of data wrangling I came across a huge problem where Jupyter was not giving me permission to create a new file and so I had to use class lab one file to clean data. After most of my data wrangling finished my laptop frequently got hanged ever since I had to install Ubuntu on Oracle Virtualbox for Big data class. So, the file some how did not get saved and it still opened the old lab one file.

Critique-

Given the knowledge and experience that I had of Tableau when starting this project, I must say I am quite happy with my efforts. However, this is just the first version and there can be improvements over it in the future.

There are millions of websites dedicated to providing information on a renowned topic like soccer. Hence, the biggest problem was validating the data that I found. Data would often be varying on different websites thus to get the correct set numerous more websites had to be search until I was sure about the facts provided in that dataset.

At places this dashboard looks beautiful but overall it does lack some refinement and maturity for it to be deemed as a brilliant one. A comparison based on their popularity on various social media platforms is something that is missing. Even though it does not change our claim still it would have given a new perspective to look with.

Moreover, what’s missing is their record against good team and poor teams to see how well they perform against better opponents. As often scoring against bottom teams is easier compared to top teams. Such a dashboard would help us understand how often they deliver when the going gets tougher. It is their impact versus the top teams which people fondly remember and judge a player based on it.

Another thing missing is a dashboard to show many goals they have scored from different body parts. This includes goals scored by each foot, head, and any other body part except hands to show their ability with both feet and head.

Scoring from outside the box, or from a free kick or from inside the box tells you a lot about the player. Many people call Ronaldo as penaldo for his knack of scoring from the penalty spot more often than Messi who generally scores more in open play. Scoring headers is something you won’t associate with Messi due his smaller height. So, we see how it can play a role in judging a player’s potential and performance on the pitch.

To conclude, it was a great experience to work on Tableau and try to learn some of the features provided by it. Tableau is a powerful tool to display your message in a way that speaks louder than words ever can. Choosing a topic that I had first class knowledge of did not make my job easy rather leaving me muddled often than not. With the help of data that I found and with the use of these dashboards, the message is loud and clear as to who the best player of the last decade is- Lionel Messi.

 

References:

https://fivethirtyeight.com/features/lionel-messi-is-impossible/

https://www.kaggle.com/hugomathien/soccer

http://messivsronaldo.net/

http://messivsronaldo.net/records/

https://grup14.com/column/messi-vs-cristiano-ronaldo-what-defines-a-big-game-player

https://www.fifaindex.com/

https://en.wikipedia.org/wiki/Lionel_Messi

https://en.wikipedia.org/wiki/Cristiano_Ronaldo

https://en.wikipedia.org/wiki/Andres_Iniesta

https://en.wikipedia.org/wiki/Eden_Hazard

https://en.wikipedia.org/wiki/Mesut_ozil

https://en.wikipedia.org/wiki/Zlatan_Ibrahimovic

Github: https://github.com/viraaj589/MI2

Word Cloud as a tool for visualization

Source:  https://www.census.gov/dataviz/visualizations/007/

Description: This word cloud visualization presents the information about the cities that have ever been listed as top 20 most populous cities in the country, since 1790. The size of each city name reflects the number of times that city has been ranked in the top 20.

What are the pros of using word cloud as a tool for visualization:

1. The good thing about word cloud is that it reveals the essential, that is the key words pop, and a reader can easily visualize it.

2. Making word clouds are easy and fast to make as compared to the other form of visualizations.

3. Word clouds are very engaging because their visual representation of data tends to make an impact and generate interest among its audience.

What are cons of using word cloud as a tool for visualization:

1. One of the major drawback of using word cloud is that its display emphasizes on frequency of words, not necessarily their importance.

2. Word cloud generally categorizes the words by making difference in their size, or their frequency of occurrence, but the design of the words such as white space between the characters, or use of bold font can make it appear more or less important relative to others in the cloud. This can mislead the viewer’s perspective.

Let’s move on to the critical analysis of this visualization:

1. Does the visualization fulfilling its purpose? – Since the goal of the visualization is to show the number of times these cities ranked in top 20 most populous cities in US from 1790, the use of the primary feature of the visualization, i.e., the font size, to convey that information makes sense. But, there must be a way to convey another important information about the cities which is their current population, to remove the above confusion. At the end , I have describe some ways to address this.

2. Audience:  I guess it is meant for common public living in USA. If it is meant to serve US government or to help any survey like CPS, it is defeating in its purpose because of the weaknesses mentioned below.

3. Claim: This visualization claims to present the top 20 most populous cities in United States since 1790 to 2010.

4. Rebuttal includes the following points:

Misleading font sizes: Counter-intuitive relative font sizes of different cities. Explained by the following  examples:

1. City with smaller font size is more populated than cities with larger font size: Even though Los Angeles is the second most populous city in US, its font size is much smaller than that of the cities like Baltimore and Boston whose population is one-sixth of that of Los Angeles. The reason is LA’s population grew recently while the data presented is historic (1870-2010). Hence, the plot does not take into account the current (or recent) population numbers of these cities. This makes viewers think that Baltimore and Boston are much more populous than LA even though the case is exactly opposite. Hence, it cannot convince its viewers.

2. Similar font sized cities differ enormously in their population: Likewise, similar sizes of New York, Baltimore and Boston give an impression that the population of these cities are comparable. However, New York is approx. twelve times more populous than both the cities. Hence, the font sizes are completely uncoordinated with the current population numbers of these cities.

Hence, I would not categorize this visualization as truthful because it is deceptive, also this visualization cannot be counted as insightful or enlightening  because due to the ignorance of above mentioned details, neither it provides any new information to the audience nor it can initiate any change. 

What could be done better?

The visualization has two main features:

  1. Font size of the cities.
  2. Font color of the cities.

As can be seen from the visualization, both above features are used to convey the primary goal/statistic which is the number of times these cities ranked in top 20 most populous cities in US from 1870. Cities with higher value of the primary statistic have bigger font and darker shade of green and vice versa for the cities with lower value of the primary statistic. For example, Los Angeles is smaller and lighter compared to Baltimore.

My suggestion:

Instead of using both features (font size and font color) to convey the same information, i.e., the primary statistic, use one of them to convey the recent population numbers of the cities. I would like to use font color to convey recent population numbers using a visualization somewhat like Figure 1. In the figure, font colors are varying shades of red. Darker the font color, more populated a city is. Larger the font size, greater is the primary statistic for that city. Some quick observations from the figure:

1. Los Angeles has the lowest primary statistic; hence, it has the smallest font size.

2. Baltimore has the least population; hence, it has the lightest shade of red.

3. The population of New York and Los Angeles are closest to each other compared to any other city-pair; hence their font colors are very similar. But, since, the two cities have the highest difference in their primary statistics values, their font sizes are the most differing than any other city-pair.

The advantage of this visualization scheme is that it effectively ties the primary statistic of all the cities with their recent population numbers. This puts a check on the confusion arising out of the font sizes uncoordinated with the current/recent population numbers. So, on one hand, the user sees the contrasting difference in font sizes of New York and Los Angeles and infers about the high difference in the corresponding primary statistics. At the same time, the viewer notes the striking similarity between the font colors of the two cities which should prompt him to think for a while (and read the description) to infer the closeness in the population numbers of the two cities. All these changes if implemented, can make the visualization more convincing, truthful, enlightening, and insightful.

Conclusion:

Simple is not always better: When I noticed the weaknesses of this cloud map, I realized that simple is not always better. Though the word clouds are easy to make and can be easily interpreted, but mistakes such as using two features (size and color) to represent only one dimension, may mislead the viewer’s interpretation. 

 

 

 

 

Retail Apocalypse

Over the past few years, there has been a growing amount of attention being paid to the troubles that the retail industry have been facing. As many reports have shown, the retail industry has been negatively affected by the rise in mobile and online shopping, the downfall of shopping malls, poor financial decisions, and increased debt, which has led to more store closures, bankruptcies, and a loss of jobs in the industry.

This is a graph that shows how many stores various retail chains have either closed or were expected to close during the first quarter of 2017. The article is trying to use this graph to show that the, “retail apocalypse,” which is the phrase to describe the struggles of various retail brands and the entire retail industry, have been going through. The graph lists different companies, and then puts a bar and a number for the number of closing stores. The graph has been sorted from the most number of closing stores to the fewest.

The claim/argument that this graph is trying to make is that closing of all the stores is proof that the retail industry is in big trouble. As the graph’s warrant is that since a lot of retailers, many of which used to be very large (Sears, Macy’s, JCPenny, RadioShack) have been forced to close stores in the same time frame of early 2017, this shows that not only are these specific retailers in trouble, but all retailers are in trouble. The backing is the actual number of stores that have been closed. The graph doesn’t have a clear qualifier, but makes an assumption that closing stores is automatically a sign that a retailer is having financial difficulties, and has no rebuttal. The action that the graph wants it that something should be done to help the retail industry.

The aspect of this graph that I like the best is its aesthetic value. The graph has a very simple and clean design that makes it very easy to read, and makes it easy to come away with the graph’s intended argument. It feels like there is just the right amount of information being presented in the graph to prevent the information overload problem that some graphs have. The graph does not have any pointless gimmicks (no 3d, no shadows, etc.), and the blue on grey background is pleasing enough on the eyes. This also feels like a graph that can be presented and is accessible to just about any audience, from industry leaders to people with no industry knowledge. It shows very clearly that retailers are closing stores, and thus, the retailers are in trouble. The graph also clearly references where it got its data.

There are several problems with this graph that keep it from effectively making its desired argument. The main problem is that it while it’s information may be accurate (in terms of number of stores being closed), the information strikes me as incomplete, and thus, may not be truthful or insightful on the state of the retail industry. For example, the graph does not show how many stores these retailers have closed in the early parts of 2016, 2015, and so forth. If RadioShack, for example, closed at least 600 stores at the same time last year, then it could be argued that while RadioShack is still in trouble, it may not be in trouble as much as it was before. This problem could be solved by either adding a second graph or adding extra bars to compare the number of stores closed in prior years. Another missing piece of information is how many stores are being closed in comparison to the number of total stores each retailer has. While Wet Seal closing 171 stores sounds bad, if this number is only, say, 5% of Wet Seal’s total number of stores, then we could again argue that the retailer might not be in serious trouble. One might visualize this by using a stacked bar, with the number of closed stores inside a bar representing the total number of stores.

In addition, it is possible that these retailers may have had too many stores to begin with, so closing stores might not be a sign of trouble, but a necessary step to become more efficient, and thus, a good thing. Another missing piece of information is if these and other retailers also opened any stores in the same time frame. If other retailers have been opening more stores during this or past time frames, then that would counter the argument that the whole industry is trending down.  Another problem with the graph is that the time frame of, “Early 2017,” is unclear, possibly too small of a time frame, and could misrepresent the data even further. In addition, this graph does not factor in other metrics of industry strength, such as number of jobs added and profits. Also, I don’t really think CVS really fits in with the other retailer examples. The graph also lacks causality mechanisms, as it doesn’t give any explanations as to why the stores are closing. This could be solved with a complimentary graph that shows the rise in online sales, for example.


Retailers stores closing 2017

 

http://www.businessinsider.com/the-retail-apocalypse-has-officially-descended-on-america-2017-3

The U.S. Job Market Is On A Historic Growth Streak

https://www.forbes.com/sites/paularosenblum/2017/05/01/five-reasons-why-the-retail-apocalypse-is-a-red-herring/#5d5b1eb561fa

https://www.theatlantic.com/business/archive/2017/04/retail-meltdown-of-2017/522384/

How Much You Must Earn to Buy a Home

I actually found this visualization while searching for data for the team project, and it looked like a good candidate for a blog post. Here is where it can be found https://howmuch.net/articles/how-much-must-earn-to-buy-a-home-metro-area
This graph is published on the website is used to show major metropolitan areas across US and how much it takes to buy a house in the certain metropolitan area. This graph is very weak visualization in my opinion because of the following factors:
1. The presentation of the map itself has a nice 3d effect going on. We have discussed this in class and came to conclusion that 3D for any kind of graph is just unnecessary complication and only distract the reader from the message.
2. The scale is nice thing to provide on the legend but it doesn’t match the map. San Francisco for example peaks at 147k, yet the graph of San Francisco is higher than legends 150K. I mean it can be caused by 3D effect, but I did measure it with ruler and it is actually out of scale.
3. Uneven load on the sides compared to the middle of the map. San Diego and Los Angeles as well as Philadelphia and Washington are obstructing each other’s views and make it even harder to read.
4. Bar charts are made in form of the cones and it makes them much harder to read. The pointy ends of cones are not very distinguishable from the map itself. In addition cones tend to distort the proportions.
5. Color combination of the whole thing is just bad, and I am not only talking from aesthetic point of view. The contrast between map itself and bar graphs are not very distinctive, so end of the graphs actually blend in in to the map itself, very bad for visual representation of information. Also it looks like west coast bar graphs look darker then east coast bar graphs, yet according to the legend the color is not a property of bar graphs, colors should be the same across the map, yet they look different. Probably it has something to do with shadow effect of the 3d map.
6. Creators of the visualization probably realized all the visual downfalls to some degree so they actually labeled each bar graph with its value, so my question is “what is the point of visualization if you have to rely on labels to show the values of each data point?”, I don’t think this constitutes a good use of visualization.
7. The most confusing part however is visualization of house median price. It is logical to think that green is good and red is bad. But in this case the colors do not have this meaning; at least I hope they don’t. San Francisco is showed in red while Detroit is shown in green and it can be concluded that Detroit is better than San Francisco to buy a house. But Detroit is a ghost town with high unemployment and crime rate. And even considering the lower housing price it might be actually harder to buy a house there with Detroit’s salary.
8. The final issue is that the 2 parameters reported (the median home price and salaries needed to buy a house) are tied together, unless they do some fancy calculations (which they don’t) or take in consideration any differences in property taxation among different states (which they also don’t). In other words 100k a year salary will be required to buy a house in an area with median house price of 550k no matter where it is located on the map. So what is the point to include them both on the map, especially considering how hard it is to read color coded information (the color gradient very crude and map shadowing affect the colors as well ).
In conclusion it is one of the graphs that is better presented in plain text or graphed as series of bar charts next to each other. Now if users have problems with geography it is best to just include the map separately so users can look up places of interest individually.

Healthiest Cereals

An American invention, breakfast cereal began as a digestive aid, acquired religious overtones, became a sugary snack and now toggles between health food and sweet indulgence. Throughout that history, it has mirrored changes in the world beyond the breakfast table. Cold cereals are an easy, convenient food for busy people. Many boast impressive health claims, or try to promote the latest nutrition trend.

But are these cereals really as healthy as they claim to be?

These products are NOT always healthy just because they have small amounts of whole grains in them. These are highly processed foods that are loaded with added sugars. Small amounts of whole grains do not negate the harmful effects of the other ingredients. If you must eat breakfast cereal, make sure to read the ingredients list and be wary of front label health claims. Choose cereals that are high in fiber and low in sugar.

Data show that consuming commercial breakfast cereals can increase overall nutrient intake and lead to a healthier weight – when compared to those who skip breakfast. It’s hard to conclude much from studies on eating cereal and body weight, due to the multitude of other factors. Most of us can acknowledge that a majority of cereals are edible entertainment. Cereal consumption usually has little to do with nutrition and a lot to do with advertising, characters, images, and convenience. A look at this visualization can help us make better decisions next time we are shopping for breakfast cereals.

https://public.tableau.com/profile/mike.pullen#!/vizhome/Cerealdata_0/Dashboard1

What I like about the visualization

Visualization is simply and functional: Represents the key nutrition criteria you should pay attention to, based on the big three—sugar, salt and fiber. Healthy cereal is the one with high fiber (4g or more) and low sugar (10 g or less) and low sodium (140mg or less).

Visualization makes it easy to contrast and compare the various options available: Cereal is big business. Each new cereal production technique is a marketing strategy. This visualization make it easy to compare the various options and make a wise decision based on the ingredients the contain.

What I did not like about it

The visualization is not very insightful: There are many cereal brands making lots of types of cereal. Today cereals contain everything from sugary morsels to healthy flakes, to Frosted Flakes and so on. You get to be the one to who decides which one to choose among the various options available. The visualization could accommodate more factors into consideration rather than just the considering the sugar salt and fiber contents.  Details regarding calories, proteins, fat, carbohydrate    etc. could be provided to make it more detailed.

The visualization lacks the Aesthetics: The choice of colors to represent the cross mark and the tick mark on the graph is appropriate and well understood but the visualization is not eye catchy or does not stand out.

Audience consideration: Companies use bright colors, cartoon characters and action figures to attract children’s attention. Not surprisingly, this causes children to associate breakfast cereals with entertainment and fun, making them the major consumers of breakfast cereals. It is necessary that the graph be understandable and interpretable by children to help them make wise decision when they would be shopping for breakfast cereals.

Choosing the right graph: The Graph represent 3 major factors against which the visualization is based. Sodium content is associated with X axis and Fiber with the Y axis. It is not clear how the 3rd component is mapped into the graph. Choice of any other form of graph would be better representation for the data in hand.

How it can be made better

Designing the visualization with consideration to other components: The following visualization shows the various components for every breakfast cereal under consideration.Increase the aesthetic component by incorporating different colors: And increase the  usability of the visualization by using the right form of graphs.

Redesigning :

https://us-east-1.online.tableau.com/#/site/sapthami/workbooks/67835/views

https://docs.google.com/document/d/1A9_QY_VtbbJtOC-9mJurN7QbMcUlfOygA3WamkSdinU/edit

 

references:

http://andrewmartineau.com/whats-the-best-breakfast-cereal

http://www.precisionnutrition.com/all-about-breakfast-cereals

https://authoritynutrition.com/are-breakfast-cereals-healthy/

Exploring EPL Dashboard

Let’s first look at a dashboard talking about premier league.

https://public.tableau.com/profile/cognitive.dissonance#!/vizhome/EnglishPremierLeague_0/EPLStory

  1. League table and Played Opponents In Prior Seasons: You can pick up any team in the table, it will show previous fixtures and upcoming fixtures, also you can see the comparison (avg. points, goals, goals against, winning%, draw% and loss% of playing against same teams in the league in recent eight seasons (It excludes teams promoted and regulated). Comparing same fixtures in prior season is a fair to see how a team is really getting worse or better in one way. The author has a distinctive idea. However, too much information to spread to audience. After I reviewed a couple of clubs, I failed to see the difference between clubs and the change of one club — Data does not change too much. Evidently,  the author did not realize how the data is changing through the seasons. The tiny change proves the dashboard is a fail. It looks like the author give audience a maze to find a exit.
  2. Season Progress By Fixture: Comparison is a nice skill in showing dashboard. Here you can compare your club with League Winner, Championship League Line (Top 4 can get through Championship next season), Europa League Line (5th – 7th can get through Europa League) and Relegated Line. Also, it can select a few variables: points, goals conceded and away wins. Comparison by time could make it easy for audience to get the turning point. For example, if you choose Arsenal, and compare with League Winner, you can easily figure out before fixture 22, it’s pace was as good as League winner, but after that, they ran out. So it’s not painful as a previous dashboard. We do not need to find exit in maze. However, this is still not a perfect dashboard. The author only pick up three variables – Points, Away Wins and Goals Conceded. Does he want to spread a message: those are most important measurements? How about goals? Home Wins? Games against major clubs? Why he neglected those indicators? Incomplete information would probably bring incomplete conclusion.
  3. Final Standing and Points:  It’s hard to guess what the author is going to claim. Maybe he wants to let us see the trend of points by times? No. The X-Axis is not year. It’s not a trend how a team is going to be. I am getting confused why the author picks up final standings and points as x and y axis. And the several trend lines (totally four lines) also confused me what the author is going to represent. This is not a maze, not too much data. But, this is a vague dashboard, the audience cannot get the clear claim from the author. So, do not take it for granted that your audience knows what you are thinking. Clearly give it out.

Let’s look at another dashboard taking about Premier League:

https://public.tableau.com/en-us/s/gallery/premier-league-15-16-so-far

  1. Who plays Where: If want to show a player’s activity zone, it would be better to show a zone, not one spot. Giving an average spot would lose other important information. Usually we could see a player’s hot spot: deeper the color, frequent the activity of a player. And that’s useful to do some analysis of a player: why he played more in that area? where is his favorite area? Especially when coach changed his role, for example from winger to midfielder. The “average” position is useless to get any conclusions.
  2. HOW DO THEY PLAY? The beautiful lines spread no information. The author gives too much information in a tiny tiny space. Lines across each other, it’s beautiful but totally none functional.

Finally let’s look at an interesting dashboard:

https://public.tableau.com/en-us/s/gallery/premier-league-ranking

It’s a beautiful chart. But, the author wants to say: Which teams unexpectedly snatch more than their fair share of points from sides above them? I did not find the answer. Do you want to count how much green cubes above a team, and divided by all cubes above a team? No one wants to do that. Besides, since Liverpool is in 3rd place, there’s less to compare. The better way is comparing a team’s winning% with top six and bottom six, then you would find the answer, in fact, Liverpool played best against top teams this season, but it’s hard to figure this out through the little cubes.

 

Fox News- What more can i say!

FOX NEWS- What more can you say!

More than 2.8 billion people get their news from Television and it pains to see how top news channels are creating news rather than showing the truth. It has now been a trend to highlight news with spice to gain viewership thus focusing more on money rather than their actual duty of enlightening people. The sinking of Maine is one such example where public opinion was stirred against Cuba by deceptive media. Such false claims nearly broke global war and all due to misconceptions portrayed through press of the time.

Here too we have the wonderful Fox news once again just trying their best to show something that does not quite match with their data. The dashboard shows unemployment rates during president Obama’s time at the white house in 2011. One can clearly see the line graph to be completely wrong at places for example in the last point above November the value for the line is 8.6% yet it is placed on the same level as 9%. one can clearly see point with 8.8% to be placed lower than point with 8.6%.

We have no knowledge of whether it was done due to political reasons or a technical mistake but one thing is clear, news tycoon like Fox know what they are doing and there are certainly less chances of it being a technical glitch. Numerous studies and experiments have found out that people believe what they see and this trend increases as the audience goes from being higher educated to lesser.

Such a dashboard has nothing good to say about except for the fact that may be the topic of discussion is worth our while since many people are facing the issues of unemployment in all the corners of the world. Moreover, it looks validated since they mention the name of Bureau of Labor Statistics as their source for the data.

 

Modification-

  1. The most basic modification that can be done is to primarily build a chart with a correct line graph. Now we can see why Fox news tried to be deceptive as the correct graph with same numbers shows how in the end of the year unemployment decreased at a fast pace. The graph shown by Fox news depicts a whole different message than a correct one even though they have the same data.                                                                                                                                              https://drive.google.com/a/scu.edu/file/d/0B8KmJc9yUb7ybnNwSjBSc0xYSGc/view?usp=sharing
  2. It would certainly have been better had they shown the trends of unemployment of previous years and compared them with this graph thus giving a broader picture to its audience. Each year shows a wide variety of fluctuations and in such situations, it helps to know the overall fluctuation to gain a higher understanding of such important topic.

https://drive.google.com/a/scu.edu/file/d/0B8KmJc9yUb7yV09OUnNIazNrcU0/view?usp=sharing

Charity done wrong?

 

Below visualization is part of an article named –The truth about the Ice Bucket Challenge: Viral memes shouldn’t dictate our charitable giving. This article talks about the rationality of the choices people make while donating money for healthcare. It suggests that these choices are often driven to donate for a disease that has affected a loved one, as opposed to the diseases which have impacted more people or have the minimum funding from government and healthcare companies. Other times, these decisions are not even based on ethical or emotional values, but on the celebrity involvement, for ex- ALS Ice Bucket Challenge.

To support these very facts, below visualization was used. It talks about major causes of deaths and money raised for each of them, which (when you analyze the picture for good 10 minutes) shows that the charity choices are going really wrong and are not being done for the right causes.

There is absolutely nothing working for this visualization which can make it a worthy of such a strong claim i.e. “the donations are not going for causes which need the most urgent attention”.

Problems with the picture:

  1. Color Palette: It take a while to realize that each color indicates a cause, and to make it worse, there are 3 colors (breast cancer, HIV/AIDS and motor neuron disease (including ALS) which belong to same family, making it difficult to differentiate, and impossible for the color blind.
  2. Alignment: First look at the picture, a person, heuristically, would expect deaths and donation for every cause to be aligned together. In other words, first circle would should death and donations of the first cause and so on. Once your realize that’s not the case,it is practically matching the columns like. Understanding the pattern at once is almost impossible.
  3. Circular representation: The circles do not help in understanding the amounts, this is just a poor choice of pictorial representation, when it could have been easily shown by a line or bar graph.
  4. Legend:  If the graph choice was correct, the labels would have been enough making the legend redundant. Correct label and graph would make it really easy to display the point it is supposed to make.
  5. Data problems: It is the legend that makes you realize that the sub-text is actually the organization against which the donations are being done. This means that we are not even talking about the entire donation made for the cause in the country but still comparing it with the over all deaths.

Re-Designed Graph.

Source

Food consumption & Obesity

Fruit/vegetable consumption across US state

percent of obese across US states

The above visualization depicts the food eating habits across the US states. The article suggests the higher consumption of fruits/vegetables, the lower the obesity rate.

What I like about this visualization?

The first visualization shows the fruit/vegetable consumption and the second visualization shows obesity rate across the US states. Both visualizations show what it says but it would have been better if there would have been one visualization that depicts relation between fruit/vegetable and obesity rate. Also, each graph has color and show color variation as per range.

What I didn’t like about this visualization:

1. It’s difficult to understand the relation between fruits/vegetables with obesity. One needs to look at both visualizations and then try to decipher the relation. Visualization should be easy to understand and interpret the correlation between food consumption and obesity.

2. The article has taken only one factor into consideration to decide the obesity rate. It fails to take other factors into consideration such as lifestyle, physical activity, social and economic factors. it should have collected more data to decide the which factor could lead to increase or decrease in obesity rate in different states of US.

3. One could interpret the visualization that Colorado is skinny state and as well as less meat consumption state, which is totally wrong.

Audience:

This article is written for rural American to show the trend of food consumption and its relation to obesity

Does it has good visualization properties:

1. Truthful: Yes, it is. Though it’s combination of two visualization to show relation between fruit/vegetable consumption to obesity rate, but, each visualization is truthful in what it says.

2. Functional: Yes, each visualization is functional and conveys the individual information.

3. Beautiful: It’s partially beautiful, it’s just the map graph with color variation but not that appealing to the eyes.

4. Insightful: This both visualization doesn’t give any insights. The visualization fails to show any trend.

5. Enlightening: It fails to enlighten from the audience perspective. It’s show the consumption of food and obesity rate but no enlighten about what action could be taken to reduce the obesity rate.

How this graph could have been better:

1. Instead of two visualization to show relation between fruit/vegetable consumption to obesity rate, one could have used Dual axis concept and depicted the relation in graph.

2. Use of Bar graph. One could use bar graph to depict food consumption across states and line graph to show the trend of obesity rate of each state.

3. Also, collecting data and depicting other factors that could relate to obesity rate would have been more insightful. Adding other factors would be very helpful to decide that only consuming fruits/vegetable could lower obesity rate or not.

References:

Daily yonder