What Data Do I have?

I am rewriting last week’s blog entry with an example to emphasise the importance of the kind of data that can be present ( categorial, ordered, ratio) and how can it be visualized. I found the visualization which shows the Titanic Survivors. The interesting thing about this graph is that it shows the number of individuals as per different categories (dimensions) like Status (survived/perished), Sex (Male, Female), Age (Child, Adult), Class of travel (First, Second, Third, Crew).

Even though the visualization at the first glance looks chaotic I really liked the way in which the author has arranged the dimensions where by they are connected or grouped based on other dimensions. In this way we can get the exact number/percentage for example –  third class female child perished is 1%.  Similarly, if we hover on the category that gives the total numbers for example total crew 40% (885).

Another noteworthy design is the way the “mark” of the chart is separated as it flows from top  to the bottom category.
I want to conclude that the author has done an excellent work in visualizing the dataset which contains different categories.

 

Last Week’s Entry

For this week’s blog entry I would like to summarize the “what data”. As I am starting to work on my projects I wanted to look at the data I have collected so far.   

There are two types in which the data is stored

  1. Table Data – table data has attributes and rows. Each column/field/attribute explains what type of data is present in the row.  
  2. Metrics Data – metric data is the  has 2 or more dimensions which represent a data point.  This more useful information for analytical purpose.

Three broad kinds of data –

  1. Categorial data – the data is represented in categories, categories of the movie like humour, horror etc. We cannot do calculations solely on this data. 
  2. Ordered Data – the data is presented in terms of ranks, we can definitely say which city is better than another but we cannot  elaborate on how much it is better than the other by this kind of data.
  3. Ratio Data – This data has numbers and we can do calculations as the data is quantifiable. For example We can clearly say milage of car A is better that of car B by 6 miles/gallon.

By deciding which of the categories does my data fall under I can decide on why part of the analysis.

How to create better Dashboards.

Last week, we all completed the exercise 2 and for some of us this was the first dashboards we created. Even though using tableau to create a chart or a graph is super easy, analyzing it to get to the results you need is a time-consuming task and requires lot of iterations. So, after I completed my first dashboard I tried to analyze if that was the best I could do. This got me to research on the methods and approaches to designing great dashboards. Upon that I came across this article about “Designing and Building Great Dashboards

“Different people in the company ask for different data to be displayed and soon the dashboard becomes hard to read and full of meaningless non-related information.” (SMITH, 2015) So, focusing on these high-level design rules help us to create a dashboard that is worth the time and effort we put in to designing it.

Rule 1: WHO ARE YOU TRYING TO IMPRESS?

The most effective dashboards target a single type of user and just display data specific to that ‘use case’.

 

Rule 2: SELECT THE RIGHT TYPE OF DASHBOARD

Dashboards are of different types and each of them is used for performing a specific purpose.
The types of dashboards are Operational, Strategic / Executive and Analytical dashboards.

 

Rule 3: GROUP DATA LOGICALLY – USE SPACE WISELY

Grouping data is very important to get the dashboard right. Either grouped by department or functional area.

 

Rule 4: MAKE THE DATA RELEVANT TO THE AUDIENCE

Ensure that the data you display on the dashboard is relevant to the users. The components should always be designed thinking about the scope and for data to reach of your users.

 

Rule 5: DON’T CLUTTER YOUR DASHBOARD – PRESENT THE MOST IMPORTANT METRICS ONLY

Whether it is useful and useless information added to fill the dashboard cluttered dashboards don’t give the impact. It often takes away the focus from the important messages.

 

Rule 6: HOW OFTEN DOES THE DATA REALLY NEED TO BE REFRESHED?

For dashboards that are interactive, we always have to keep in mind that the data keeps changing and so the dashboard has to be updated.

https://www.geckoboard.com/blog/building-great-dashboards-6-golden-rules-to-successful-dashboard-design/#.WJ9m1rYrKRs

Heatmaps decoded!

Heat maps use color variance for data visualization. They are intensive used for displaying variance between different variables, displaying any particular pattern between them and if any correlation are present between the variables.

  • The rows and columns of a table form the matrix structure of the heatmap. Each cell of the matrix contains color coded data or numerical data which is displayed on a color scale. The matrix data represents the relationship between the variables of the row and column associated.
  • A legend should be given alongside the heatmap for better understanding of it. Numerical data requires a color scale which has different colors blending into one another to show variance of high and low in the associated data. While categorical data is color coded.
  • Heat map uses the color differences to display changes in value, hence it should be used to give a more generalized view of the numerical data. Heatmap should not be used to display sensitive data which needs to be represented accurately.
  • Heat maps are best used to show changes in values over time. Any column of row can be used to denote the time changes.
  • The colors in the heatmap should be chosen carefully as the difference must be visible immediately to the human eye. Rainbow color schemes are highly used as humans can perceive more shades of those colors. Grey color scales must be avoided as they are difficult for perception.
  • The best use of heat maps are done to show temperature changes in a city or town over months or years or to depict the hottest and coolest places to stay.

Source: http://www.datavizcatalogue.com/methods/heatmap.html

Heat Maps, to use or not to use?

https://www.bloomberg.com/news/articles/2015-12-03/electric-cars-can-t-take-the-cold

This week we had an interesting discussion in class, about when to use heat maps and how to interpret them.

I came across this heat map. The claim is, Since electric cars generate power less efficiently as the temperatures drop, they are sold more on the West coast than other regions of USA. And obviosuly, there are other reasons like West Coast being more technology savvy than the rest, also adds up to the sales.

About the visualization –

  • The heat map  is U.S electric vehicle sales by region
  • On observing we see, that there are 4 patterned boxes for 4 regions and California is in light blue
  • On the first glance, one state and other regions seems confusing
  • It takes time to interpret the heat map

The Underlying meaning

  • The visualization depicts the 4 regions and the sales made in each region in September
  • What they have tried to show is, even on the West Coast, California sells the highest number of electric cars.
  • The number of cars sold in California, is greater than those combined for Midwest, Notheasrt, South and the remaining West region.
  • But, this could be shown through a bar graph as well, comparing sales in California with the rest. That would have been easier to interpret in the first look.

Apart from temperature, factors like population should also be considered. California is the most populated state , as per the region available, so that could be one contributing factor as well.

 

References –

https://www.bloomberg.com/news/articles/2015-12-03/electric-cars-can-t-take-the-cold

Visualizations that really work

After working on the visualization exercise 2 I really got to appreciate the amount of effort taken by designers to convey the meaning of the charts effectively. Tools such as Tableau give us chart suggestions to choose from depending on the measures and dimensions we choose. The advantage of this is translating the chosen attributes into a visualization is convenient for anyone even without data skills. But, this doesn’t necessarily serve the purpose of the insights that you would want your visualization to communicate. (Going back to our viz exercise 2, it may not be enough if you compare MSIS with other degrees it would be more effective to show MSIS is a better program to choose because it offers a more stable mid-career pay with less uncertainty.)

Steering away from viz exercise 2, on a general note when you’re trying to think through the purpose of your visualization you could start of by answering two questions:

  1. Is your data trying to show ideas or statistics? – this leans more towards the underlying data rather than the form of visualization. E.g. for idea – organization structure chart. E.g. for data-driven – Revenue growth for last five years.
  2. Are you declaring something or exploring something? – An example of declaring something is if you want to project the quarterly sales of 2016 to your manager with the available sales data. However, if you want to understand why the sales performance is lagging. You suspect that there is a seasonal drop and want to prove the same with a quick visual which become an exploratory kind of visualization where form takes priority over the available information.

Now that you know what you want to communicate to your audience the next step would be to analyze the type of the chart to choose (i.e. when to use a bar chart over line chart). Also, thinking through an effective way of using the color palette. Other aspects could be the ordering in a bar graph, starting your x axis from zero, etc. While following, these chart making rules you should always make sure your chart is communicating your claim/insight clearly.

Reference – https://hbr.org/2016/06/visualizations-that-really-work

Dangers of Bling Data Visualisation

Data Visualisation is making its way into mainstream recently. However, this gaining popularity is leading to increasing misconceptions about making attractive visualisations. The purpose of visualisation is to provide information which otherwise would be very difficult to infer from the voluminous data available. Hence, more emphasis need to be given on conveying the correct information rather than incorporating too much “bling” into these representations.

Info-graphics can be catchy, aesthetically pleasing, thought-provoking. But these features would not hold any value if the info-graphics cannot fulfil their purpose of telling informative stories. Let us consider the example of stream graph of movie box office receipts over time.

Steam Graph of movie box office receipts from New York Times

 

The graph definitely looks attractive. But can it explain its obvious intent? Probably not. It would take us some time to understand that the peaks of the curve represent the weekly sales of each movie and the area under the curve represents the total receipts. It is unclear why certain movies are below the zero line. Also, comparing different movies seems difficult by just looking at the graph. In short, no matter how appealing the graph looks, it fails to be a good-decision making tool.

Hence, while creating any data visualisation, we should not its true purpose. We need to keep in mind that visualisations are meant to define the data behind the data. And so, we should try to focus on making an informative data visualisation rather than a bling data visualisation.

Reference: http://www.information-management.com/news/news/the-dangers-of-bling-data-visualizations-10025306-1.html?zkPrintable=1&nopagination=1

Visualization As Research Tool In Psychology

This visualization is created to help academics in the field of psychology. More specifically, it presents some important contextual and historical Influences in Psychology and also the related perspectives and psychologist.

 

The whole dashboard looks simple but elegant. It uses one type of mark and two types of channels to encode the data. The node presents each element, such as a psychologist or social method. The color is to differentiating types of information.  The size of node will be enlarged when user select a related node. The more relevant these two node is, the larger the node is. The hidden X-axis is time-based, from 19th centenary at left to 21th at right.

Using this visualization tool, I believe student will master the study of psychology in more easy way. And it also encourages student to self-explore. However, it will be better if it could provide us with the information about why the linked topic are relevant.

Reference: 

http://www2.open.ac.uk/openlearn/CHIPs/index.html

 

Building Interactive Dashboards with Tableau Dashboard Actions–Filters

To interact with your users, you want to transfer the control to them to discover more and make insights they find easier to retain. Here, I will share three different ways to leverage dashboard actions to improve your user experience.

Dashboard actions in Tableau allow you to add logic to dashboard components that create actions somewhere else. For example, you can add logic that says, “If a user clicks on Dashboard Sheet 1, I want something to happen on Dashboard Sheet 2.” To set up a dashboard action, navigate to “Dashboard > Actions> Add Action” in the top navigation from any dashboard view. Three types of action will be presented: Filters, Highlight, and URL. This week, I will introduce the filters first.

  • Filter – If you click on sheet one, sheet two will be filtered to whatever you clicked on sheet one. Example:  Here is an overview of the dashboard.

We could make every individual dashboard sheet as a filter for the entire dashboard by hovering over the sheet, clicking the down arrow that appears in the upper right, and selecting “Use as Filter” (on all three sheets).

Then when clicking on any sheet, the other sheets are filtered to whatever I clicked on. For example, if we click on Washington in my map view, the trend line and bar chart sheets will be filtered to just that state:

If not all of the sheets have enough data to show the details of each state, you should set the filters separately for each chart, for example only the bar chart instead of the map.

TO BE CONTINUED…

Reference: http://www.evolytics.com/blog/tableau-201-3-creative-ways-to-use-dashboard-actions/

Understanding a Box plot

I personally have never used a box plot because I didn’t know how to use it and when to use it. But when Professor explained in last lecture about average violations per day using box plot, I found it more appealing. Box plots are great way to quickly examine one or more datasets graphically. Of course, you need to know the meaning of all fields on a box plot to understand it. Here is an easy and simple example of how to interpret a box plot.

  • Box plot (aka Box and Whisker Plot) plots all data points and splits it into quartiles (Q1, Q2, Q3) and it is represented as a box which goes from first quartile to third quartile.
  • The vertical line drawn at the Q2 is median of data set.
  • Two horizontal lines extend from front and back of the box are called whiskers. Whiskers often (but not always) stretch over a wider range of scores than the middle quartile groups.
  • The extreme points preceding first quartile and  following third quartile are known as outliers.

We can display three common measures of the distribution in data set.

  1. Range: It is the distance between two extreme points on a plot. If we consider outliers, then it is between (5) to (95)-> 90. If we exclude outliers, then it is (95-15) 80.
  2. Interquartile range: The middle half of a data set falls within the interquartile range. In a boxplot, the interquartile range is represented by the width of the box (Q3 minus Q1). In the chart above, the interquartile range is (80-38) 42.
  3. Skewness: We can identify different skewness patterns based on shape of dataset. If the data points are concentrated at the lower end, the distribution is skewed right and vice-versa. If it is evenly split at the median then it is Symmetric.

In Speed Violations example, we can easily identify danger zones which are nothing but those outliers in box plot. Also, our grades distribution on Camino is also a box plot which gives you where your grades stand in overall class grades, what is the average score and how many are above/below average.

I am trying to create a box plot in Tableau, if anybody has already done please share!

Source: http://www.datavizcatalogue.com/methods/images/anatomy/box_plot.png

http://stattrek.com/statistics/charts/boxplot.aspx

 

 

 

 

15 JavaScript frameworks and libraries ( part 2 to be continued)

6. jQuery

jQuery is another  JavaScript library to work on event handling, animation.  jQuery has easy to use API and it has When working on a web project, it takes less time to complete simple tasks and it is compatible with most web browsers. jQuerty can control DOM and Ajax application. jQuery separates HTML and JavaScript code which makes the code cleaner.

7. Ember.js

Ember.js is a mix of Angular.js and React.js. It is similar to Angular.js when syncing data. The two-way data exchange makes web application faster and more scalable. Developers can create front-end elements. It is similar to server-side Virtual DOM to provider better performance and scalability. Its community also provides sample code and libraries.

8. Polymer.js

Polyer.js is userful to create HTML5 and its main focus in to extend functionality and able to create own tags. For example, a developer can create an element with its own fuctionality similar to that element in HTML5.

9. Three.js

Three.js is another JavaScript library and it is popular for 3D development. Three.js uses WebGL and can be used to render 3D objects. It is better to use write web-based games such as HexGL.

https://opensource.com/article/16/11/15-javascript-frameworks-libraries