Spot Visualization Lies – Part II

Odd Choice of Binning

Instead of showing the full range of variation in a data set, someone might try to oversimplify a complex pattern. It’s easy to transform a continuous variable into a categorical one. Broad binning can be useful, but complexity is often what makes things worth looking at. Be aware of oversimplification.

Area Sized by Single Dimension

Most of time human’s eyes can not accurately tell how much is a square or a circle. When data are linearly sized an area-based encoding, like a square or a circle, they might be sniffing for dramatics.

Variation with Area Dimensions

Maybe someone knows how area as a visual encoding works, and then they go and do something like the above. Theses fill the same amount of area, but they look very different and still dramatic.

Extra Dimensions

When you see a three dimensional chart that is three dimensions for no good reason. It is worth to question the data, the chart, the author and everything based on the chart. That extra dimension could be nothing but just a distract factor.

Important: It does not absolutely mean a visualization is lying just because it exhibit one of the previously mentioned qualities. With that in mind, make sure you have the right reaction before you call someone a liar.

As rule of thumb, scrutinize charts that shock or seem more dramatic than you thought. 

https://flowingdata.com/2017/02/09/how-to-spot-visualization-lies/

Introduce great visualization TED Talks and Gapminder

This week I have focused my research on data visualization on renowned statistician Dr. Hans Rosling, a scientist with a great sense of responsibility by making the world better with datasets that will change people’s mindsets. May he rest in peace in heaven.

I started watching a TED talk by David McCandless, the founder of website “informationisbeautiful.net”, who mentioned the meaning for visualizing data—-compressing the overload information to reveal patterns or connections that matter.  And Mr. McCandless regards Dr. Rosling as his master.

Then I found out Dr. Rosling’s talk, the one ranked as one of the top 500 TED talks, that he gave to the US State Department on the topic of developing countries’ health issues. Dr. Rosling used his animated visualizations to illustrate the changes of children’s death rate, people’s lifespan and HIV carrying rate of people living in different countries over time. Dr. Rosling was an enthusiastic scholar that cares about improving the overall health status of the world, and I greatly respect him for that. Also I have discovered this awesome website that Dr. Rosling had founded.

As people can fastly comprehend information conveyed in a picture, and as the size of data right now is so enormous,  the way to use data visualization is inevitable. Just like programming is the way to communicate with computers, data visualization will become a common language and a tool, to interact efficiently with people’s mindset. For now, the resources above are enough for me to go over as a rookie visualizer, and I’ll need more knowledge in data mining, data processing and statistics. Hope that one way I can express dataset freely that will catch people’s eyes and make them ponder. And then I can proudly call myself “a data artist”.

Spot Visualization Lie – Part I

Lying with statistics has been a thing for a long time, but charts tend to spread far and wide theses days. Some don’t tell the truth. So it’s all the more important now to quickly decide if a graph is telling the truth. This is a guide to help you spot the visualization lies.

Truncated Axis

Bar charts use length as visual cue, so when make the length shorter using the same data by truncating the value axis, the chart dramatizes differences. Someone wants to show a bigger change than data actually tells.

Dual Axes

By using dual axes, the magnitude can shrink or expand for each metric. This is typically done to imply two events which actually independent with each other are correlation and causation.

It Does Not Add Up

Some charts specifically show parts of a whole. When the parts add up to more than the whole, this could be a problem.

Seeing Only In Absolutes

Everything is relative. You can’t say a town is more dangerous than another because the first town had two robberies and the other only had one. What is the first town has 1,000 times the population that of the first? It is often more useful to think in terms of percentages and rates of relative factor rather than absolutes and totals.

Limited Scope

It’s easy to scope dates and time frames to fit a specific narrative. So consider history and proper baselines to compare against.

Due to words limited, to be continued next week…

http://flowingdata.com/2017/02/09/how-to-spot-visualization-lies/

 

How NOT to use Tableau

1) Replicate a report or chart designed in another tool:

One cannot use Tableau to exactly replicate a visualization designed using another tool. It could be a very simple visualization, but Tableau may not be meant to do that exactly or it might be too difficult to visualize. Instead of trying to replicate a report or chart, one must understand the underlying purpose of the visualization and redesign it using Tableau’s best available features. In fact, this may give a whole new perspective to the original dataset. Undoubtedly, Tableau has some great features but it is not meant to exactly mimic other viz tools.

2) Try to show tons of data on one screen with a dozen (or more) quick filters:

Sometimes the dataset in hand is quite huge with several attributes and dimensions, no doubt Tableau can visualize large datasets. However, it is up to the user to decide what is important and what is not. Visualizing all possible combinations with multiple filters can be a failure. ‘One size fits all’ does not help. Building interactive views which take a user to different desired granularity levels of detail gives a much more holistic understanding and solves any issues surrounding displaying too much info on one screen.

3) Spend way too much time on formatting:

Tableau is a quick tool to visualize your data. There could be corporate design standards that one must follow while creating visualizations using Tableau eg- Using a particular font or certain color coding. It can be fun to explore different colorful representations of an info-graphic. Tableau supports formatting through a variety of dashboard objects, controls and formatting options. However spending too much time on formatting is not advisable. Tableau is not for “pixel-perfect” reporting.

4) Connecting to already summarized data:

A summary report is a natural way for a human to read data, but not for machines. Tableau wants to connect to a RAW data format, rather than data that has been manually classified and summarized into a table. The user might think that Tableau can visualize data more effectively if it is already aggregated and summarized. But Tableau is meant to do this and we are not saving any time or making the visualization any better by feeding it summarized data. Tableau needs raw and clean data, not summarized data.

Reference:

www.theinformationlab.co.uk/2013/08/27/how-not-to-use-tableau/

Interactive Data Visualization

Static visualizations can offer only precomposed “views” of data, so multiple static views are often needed to present a variety of perspectives on the same information. Dynamic, interactive visualizations can empower people to explore the data for themselves.

  1. The Novice User. Even novices must be able to examine data and find patterns, distributions, correlations, and/or anomalies. They must be able to build and use tools that enable faster decisions based on real-time information. As the National Research Council of the National Academies of Sciences states, even “naïve users” should be able to “carry out massive data analysis without a full understanding of systems and statistical uses.”
  2. Driving Processes. The solution must allow the user to establish KPIs that provide the rules that drive processes. These must be displayed visually—for example, by color—in real time based on defined thresholds. Likes its architecture, Interactive Visualization is a means to an end – to stimulate informed action.
  3. Data Must Tell A Story. An intuitive, visual workplace that it easy to master is based on easily digestible interactive patterns. Data must tell a story that instantly relates the performance of a business and its assets. Almost every Interactive Visualization narrative takes place across multiple layers. Users must thus be able to select data elements and filters, and then highlight and modify options to change data perspectives – from high-tech overviews down to the most granular detail.
  4. Data Correlation. The user should immediately know not only of hot spots that require attention, but also effortlessly find trends based on the dynamic relationship between multiple data streams and the data derived from them by means of predictive analytics.
  5. Prescriptions: “What should happen next?”World-class Interactive Visualization and underlying analytics capabilities surpass that standard by offering prescriptive analytics(“What should happen next?”) to drive real-time asset behavior modification.

Picture below is one the best interactive visualization of 2015 according to experts. The visualization is about machine learning. To find a complete description about this please look at: http://flowingdata.com/2015/12/22/10-best-data-visualization-projects-of-2015/

Screen Shot 2017-02-20 at 12.14.19 PM

References:

http://www.forbes.com/sites/benkerschberg/2014/04/30/five-key-properties-of-interactive-data-visualization/#a5efa2344eb0

http://chimera.labs.oreilly.com/books/1230000000345/ch01.html#_why_interactive

Frontiers in Massive Data Analysis(National Academy of Sciences 2013)

10 Best Data Visualization Projects of 2015

 

Is Your Dashboard Useful?

A dashboard cannot help you communicate your data effectively if you don’t know how to build it. Having a dashboard will not make you data-driven, having a useful dashboard will. A useful dashboard is the one which is understood just by having a glance at it. There are 5 stages to make a useful dashboard:

Stage 1: Curiosity
Identifying the need for being data-driven but not what has to be done to become data-driven.

Stage 2: Play
Building your first dashboard and analyzing data. However, you haven’t identified the right business tools and processes.

Stage 3: Clutter
Manipulating the data by sharing the dashboard with colleagues and discussing with them. Business metrics are to be identified to reach business goals.

Stage 4: Clean Up
Deciding business goals and metrics that align to achieve the goals. The ownership of metric is not yet identified.

Stage 5: Focus
Understanding what data is driving the business and what can be done to achieve the goals. Being data-driven.

Dashboards are never static, they change as your business goals change.

References:

Visualization learning tips from Hans Rosling video

 

  1. Dividing the data: Hidden insights are obtained when your data is segmented more. The more you segment your data, the better insights you obtain. In the video, the story of African sub-Saharan region having the lowest GDP vs Child Survival rate is completely different when the data is segmented country wise for that region. We realize that after segmentation, countries like Mauritius has a much higher ratio (than average) as compared to other countries in the same region.
  2. Treating each data point separately: Each data point can be associated with a different problem and after carefully analyzing the data point, you can provide solutions accordingly. For e.g. the same solution can’t be applied to the poorest of Nigeria v/s the richest of South Africa.
  3. Usage of Idioms: The idiom you use for visualization should immediately provide you the information that you are trying to convey and what your chart is measuring.
  4. Checking your legibility: Run it by someone who has never seen your visualization, and ask them to tell you what the chart is supposed to be illustrating. The longer they take, the worse you’ve done.

Reference: https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen

http://online-behavior.com/analytics/effective-data-visualization

 

 

 

 

Which visualization tool should I use for what kind of data?

In the field of analytics and data visualization, it is important to understand the type and meaning of the data we are dealing with, apart from understanding the data itself. Each data set will contain different type of data-statistical, numerical, informative data, demographics, trends, sales data etc. As data/business analysts and decision makers we are confronted with numerous types of data sets and it becomes essential to use an info graphic and visualization that would best depict them. Not only should our info graphics convey the message clear, it should be done in a manner the end party can assimilate it easily. Hence, it becomes important to understand which kind of tool/technology we can use to best visualize the kind of data we have in hand. There are thousands of software and online tools at our dispense, yet we must know which one is justifiable to use in which situation. I wrote this blog after I observed an online survey of tools and technologies that people suggested to use with a kind of data.

Data: Businesses, academics, statistical data, market data based on industry, topic, or country.
Tool/Technology: Statista
Pricing: Free, Premium at 49$/month

Data: Popular topics, online trends, and current events.
Tool/Technology: Google Trends
Pricing: Free

Data: Online tables, charts, and graphs.
Tool/Technology: Zanran
Pricing: Free

Data: Public opinion, social issues, and demographics in the U.S. and worldwide.
Tool/Technology: Pew Research Center
Pricing: Free

Data: Loads of infographics with customization
Tool/technology: Piktochart
Pricing: Lite $15/month & Pro $29/month.

Data: Animated graphics and charts
Tool/technology: Zingchart
Pricing: One-time fees range from $199 (Website) to $9,999 (Enterprise).

Key Properties of Interactive Data Visualization

“Data may not contain the answer. The coordination of some data and an aching desire for an answer will not ensure that a reasonable one can be extracted from a given body of data.” While Tukey (1915-2000)

In order to build a successful interactive data visualization, the graph should have these properties: the Novice User, Driving Processes, Data must tell a Story, Data Correlation, Prescriptions: “What should happen next?” 

To verify those opinions I choose one of the most famous interactive data visualization introduced by Hans Rosling:

http://www.gapminder.org/tools/#_state_time_value=2015;&marker_select@;&opacitySelectDim=0.00;;&chart-type=bubbles

The Novice User: the interactive visualization is ordered by the time and country. it is very easy for novice user to play with, and obviously we can see overall the life expectancy is increasing. Also, the difference between countries and continent is showed clearly.

Driving Processes: The visualization use animation to show the audience how the population changes years from years.

Data Must Tell A Story: Hans Rosling even make a 4-minutes video for the story part. Pease check the reference.

Data Correlation: The user can immediately know not only of hot spots that require attention, but also effortlessly find trends based on the dynamic relationship.

Prescription: What should happen next?

Please see the youtube down below, there is an overall trend at the end of the video.

Reference:

Hans Rosling’s 200 Countries, 200 Years, 4 Minutes – The Joy of Stats

https://www.youtube.com/watch?v=jbkSRLYSojo

Life expectancy vs Income

http://www.gapminder.org/tools/#_state_time_value=2015;&marker_select@;&opacitySelectDim=0.00;;&chart-type=bubbles

Five Key Properties of Interactive Data Visualization

http://www.forbes.com/sites/benkerschberg/2014/04/30/five-key-properties-of-interactive-data-visualization/#28008b0a44eb

Gun Deaths in America

Gun Deaths in America is not a new phenomenon. These events are still prevalent in the American states. Many laws and policies have tried to reduce the number of gun killings but a majority of these attempts have failed this country. The project of Five Thirty Eight has explored multiple datasets collected by Centers for Disease Control and Prevention’s Multiple Cause of Death database, which is derived from death certificates from all 50 states and the District of Columbia and is widely considered the most comprehensive estimate of firearm deaths. This project has an interactive graph showcasing more than 33,000 annual deaths in the country.

The categories covered are as follows:

1) suicides among middle-aged men

2) homicides of young black man

3) accidental deaths

The visualization shows that the suicide rate is nearly 44 per 100,000, men in the middle-age category and geographical group have more than three times the risk of dying by suicide than the national average. For example, In Wyoming, approximately 80 percent of suicides are men; a quarter is men ages 45-64.

The visualization of the homicide category includes deaths by assault and shootings by police officers. The age group of the people in this category was in the range of 15-34. The visualizations also show the mass shootings and accidental killings by police officers during terrorist shootings. Each visualization gives us a rough idea about the ratio or number of killings in various states. Such visualizations are important to make people realize and bring the number down

 

Reference: https://fivethirtyeight.com/features/gun-deaths/