why we should use D3.js?

As we studied in the class, D3.js stands for Data-Driven Documents, which is widely used in creating the interactive visualizations on the web. The main author of the library, David Miller, gives a couple of reasons of why we should use D3.js:

1.Lots of examples.

Seriously, D3 has the tremendous number of example available online, despite those on the D3 library, thousands of previous D3 examples can be found online, from which you can use as your own source code.Almost every visualization charts you can think of, such as scatterplot, wind map, chord diagram, etc, has code you can use from.

2. Vibrant open-source community

D3 has been forked over 9,000 times on Github, which makes it one of the most popular projects on the website. Also, there are some third-party “wrapper” libraries such as NVD3 and Vega devoted themselves to speed up development time for creating common types of D3 visualizations.

3. Opportunity to learn web development skill.

One thing that makes D3 a better tool than Tableau is that the former has better interactivity thanks to a more scalable web framework. So when you learning D3, you can learn the skills about web development. 

reference:http://d-miller.github.io/Why-Learn-D3/

Visual Imagery and Simplicity

Last week, I wrote a blog about how visual embellishments and imagery can be useful and help in remembering the data for a longer time.

But, again, the context and the simplicity plays a very important role when using visual imagery. Below is the link for an info graphic, which depicts what do travelers hate the most about travelling.

http://junkcharts.typepad.com/.a/6a00d8341e992c53ef01bb0793f46b970d-pi

This info graphic is poorly organized and instead of making data flow simpler, it is just complicating it more.

Reading the info-graph takes a while and it  strains  the eyes.

The upper left corner says 37% hate to sit in a middle seat, and the aircraft is drawn in the center. This data would have looked much better if the number was placed on the seat. Or best, just present the numbers in the form of a table

37 % Hate sitting on the middle seat

25% Don’t like chatty neighbors ; so on and so forth

Reading the data would take much less time than figuring out each corner of the info-graph and then thinking about why that information was placed where it is.

It is good to use info-graphics to make the data more attractive and present it in a creative way, but then, things like readability, organization, structure and SIMPLICITY should also be kept in mind.

 

Five years of Drought: Bi-Variate Map

Bi- Variate Map: This type of map or choropleth includes two variables on a map representation. It enables us to portray two separate phenomena simultaneously. The two variables should be related to each other as the bivariate map will show agreement or agreement between the variables. If you do not expect any association between them, then a bivariate map is not the right choice.

One of the most important features of choropleth is that it represents only normalized data: standard deviations, nested means, quantiles, and equal areas.

This infographic visualization piqued my interest in bivariate maps and how and where they should be used. Here it is a bivariate map as it renders size by frequency and color by severity simultaneously.

Due to the overlap of data over a particular data point, the different variates are not visible and the saturation and intensity of the color are lost.

To solve this problem,  hexagonal binning has been used which is an effective way to aggregate and visualize data. Binning here represents the number of points that fall within a hexagon on the gridded map.

The dots are proportionally sized by the amount of time over the past five years that experienced drought (the largest dots representing 80% – 100% of the time). It is difficult to show time as a dimension in a static map and is shown in this map by representing each location by how much it has experienced any drought over the past five years. (0-20%, 20-40%, 40-60%, 60-80%, 80-100%).

The second variate is the color which is a weighted value of the intensity of those droughts (deep purple indicates frequent “exceptional” most severe droughts). It is based on the weighted sum of the number of weeks that it experienced droughts(worse droughts count more).

Both the variates are joined on the map by location and that makes it easy to understand. It is one of the most effective ways to show the dynamics of drought and can be used to represent social, geographic and demographic data in a more potent visualization form.

Source: https://adventuresinmapping.files.wordpress.com/2016/07/fiveyearsofdrought1.jpg

 

 

CARDINAL RULES OF DATA VISUALISATION

The main purpose of visualization is to represent the data in such a way that it becomes easy for the viewers to focus on the important details. This allows for a fair amount of flexibility on deciding the type of data representation. Different situations call for different designs, but there are certain cardinal rules that shouldn’t be broken lest it leads to confusion and misunderstanding. A few important ones are:

Baseline should be zero for bar charts

The data represented in bar charts are always correlated to the length of the bars. Hence, it is imperative that the baseline always starts at zero.

1

The picture above depicts the type of error that would creep in when the baseline is changed from zero. It is seen that the first bar is progressively shortening while the second one, though shortening, looks comparatively tall, giving a false representation.

Over-segmenting pie/donut charts

The general consensus is that the use of pie charts for data representation should be minimized. While that is a discussion for another day, pie charts, if not done properly face a lot of restrictions.

The picture shows everything that could go wrong while designing a pie chart. It tends to clutter if the number of sections goes past four or five. A pie chart like the one depicted above gives no information to the viewer. It would be a better idea to go for alternative representation types for representing data involving a lot of variables.

Respecting the parts of a whole

Data representations which are used to portray multiple distinct non-overlapping proportions should keep in mind that the final representation to do justice to the whole. Consider the given example. While the figure on the left adheres to the principle of respecting the parts of a whole, the one on the right shows exactly what could go wrong in such a representation.

Serve the main purpose

The main purpose of any data representation is to show the data in a lucid and appealing manner.

The whole purpose is defeated if the data representation doesn’t portray the data in a way that is not easily perceived by the viewers. Altering symbol sizes and shapes, using transparency and organizing data into subgroups are some of the ways to counter this problem of overplotting.

Explain the encodings/symbols

Never assume that the data represented is obvious and easy to understand. It would go a long way in increasing the quality and relevance of the representation if everything used is labeled and attributed to.

For example, a downward slope, as shown in the picture, could be used to any decreasing variable under the sky. It is only when the axes are labeled and the context explained that the representation starts making sense.

Source: http://flowingdata.com/2015/08/11/real-chart-rules-to-follow/

Characteristics of Deceptive Visualization!

Data visualization is widely used to convey information, to prove certain facts and to show trends. But often the visualization are deceptive, they are modified in such a way that they prove a certain claim. Following are some techniques to identify data visualization deception:

  1. Truncated Axis: The Y-axis can be altered with to exaggerate the values being represented. Instead  of having the origin as 0, it can be started with any different value to give an illusion of higher values. This is one the most common techniques for deception.
  2. Area as Quantity: Using area coverage to denote quantity is also a widely used data deception technique. The values can be denoted as circles or any other shape denoting area. Some area shapes can appear to be greater in size but may not have the correct interpretation of the information. One-to-one mapping between data and graphical is a better way of using area as quantity.
  3. Changes in Aspect Ratio: This type of deception is applied to line charts more often. The aspect ratio change may give an illusion of increase or decrease of one quantity against the other. Changes in it can alter the viewers perception about a graph.
  4. Inverted Axis: Inversion of axis leads to the change in the direction of the trend. This gives the user a notion of the reversal of the correct information. This technique doesn’t exaggerate or underestimate but completely change the notion of a visualization.

Source: https://medium.com/@Infogram/study-asks-how-deceptive-are-deceptive-visualizations-8ff52fd81239#.58vad76t0

Introducing Google Trends Database

Most of great tech companies have a tech blog to demonstrate their progress, and Google is not an exception. As I was browsing Google’s blog today, I found a very interesting article, which talks about a database called Google Trends, that collects real-time big data for various topics search through Google. The topic can be the popularity of one basketball game, the interest of one literature project over time, and elections in 2016. Also, you can view those real-time data in different formats, like line chart, bar chart, map and so on. With Google providing unlimited correlations and topics, we can retrieve useful data in a more convenient way.  W

With Google providing unlimited correlations and topics, we can retrieve useful data in a more convenient way.  What’s more, Google has also announced the acquisition of data analysis platform Kaggle this week, showing its great interest in utilizing data to deliver more valuable information.

Those two piece of news are enough to show Google’s focus on data. Thus for us, data visualization, the way to present a large amount of data will also become more and more popular and that it might be as common as Office tools in today. And I think it’s a good time to say that let’s grab this technique and move to the future along with great tech companies. 😀

Interactivity in Tableau (continued, again!)

We are focusing on Actions this week. The set of features within Actions allows the user to connect multiple visualization/dashboard and link a visualization to an external URL. The first feature i.e. connecting multiple visualizations, allows a visualization to act as a filter item for another visualization or as a data highlight item. Sounds confusing? Let’s take a detailed look at these features –

  • Use as a filter action: This action can be used when a visualization has to be used as a filter item for another visualization (this holds good for a dashboard, as well). This is particularly useful when there is data that is interlinked between various visualizations. Instead of keeping a filter card, the user can have an additional visualization, which acts as a filter, thereby making better use of space and adding more context into the analysis. While creating the filter, the user has to select the source and target visualizations which will be used by the filter action. The action trigger can be set using the following three options:
    • Hover: The trigger for this action is when the user brings the mouse over the data point in the source visualization. This kind of action is good in cases when the user is performing data exploration, rather than data analysis. This is because while doing data exploration the user will be quickly scanning across the data points rather than stopping at a single data point to perform any analysis. Also, if the number of records in the data set is large, hover action could cause performance lags.
    • Select: The trigger for this action is when the user selects the data point in the source visualization. This action is suggested when the user has to perform data analysis or the number of target visualizations that get affected are large.
    • Menu: The trigger for this action is when the user selects a data point and selects the appropriate option provided in the context menu. A menu action is best used when the user wants to provide the user with an extra layer of choice before applying the action. A good practice is to use this action for cases where the user has to be navigated away from the existing screen (it could be a different dashboard/visualization/URL).

The user also has options to choose from when he/she clears the action trigger. An example from our speed violation data set could be if we have two visualization in a dashboard. The first visualization shows the map of Chicago with the addresses marked as per violations reported. The addresses are also clustered based on geographical zones – north, south, east, and west. The second visualization shows the history of violations for each address, along with additional statistical data such as deviation from daily average, max/min for that address, etc. We may want to put a hover filter here on the map for each cluster so that the user can see a subset of the total addresses which are grouped based on their geographical proximity.

  • Use as a highlight action: This action is used when the user needs to highlight data points based on the trigger action. The non-highlighted data points still remain on the view but are grayed out, whereas in the case of the filter action the “out data” points are removed from the view temporarily. This kind of action is useful in the following scenarios:
    • The data set is large.
    • The non-highlighted data set is important for spatial reasons i.e. the proximity of highlighted/non-highlighted items is important to the user in his/her analysis.
  • Use as a URL action: This action is used when the user should be redirected to a resource outside of Tableau environment, such as a file path, URL, or send a mail. A user should be navigated away from the visualization/dashboard only when it absolutely necessary or is part of the visual discovery process. Therefore, this feature needs to be used with caution. Some sample use cases for using this feature are as follows:
    • User needs to be showed data that is not in a format that can be represented in a visual format. For e.g. links to legal documentation in the case of contract negotiation dashboards.
    • User needs to be navigated to a different but relevant data set that is not part of the data set being presented. This could be done in cases where the subset of the data, that is being linked, is needed for verification purpose.
    • The dashboard/visualization is part of an enterprise application. The data point that is being used as the action trigger is passed a parameter to the enterprise application for further business processing.
    • A user needs to be alerted/informed by mail based on a data point in the dashboard. The email feature can be setup to send an alert/information mail with the data present in the user’s trigger.

Actions provide a new dimension for the user to interact with data, in addition, to the visual representation. While setting up interactivity with actions, the points to consider are the number of source/target visualizations, whether filtering applies to each element, and the user workflow.

References:

The Rhythm of Food

Over the years Key food trends can be revealed via Google search. From the rise and fall of recipes over diets and drinks to cooking trends and regional cuisines.

The first visualization shows the top Google search volume for different things related to food i.e. Veganism, Moscow Mule, Fat-free etc.

Another best thing about this visualization is that it appropriately shows the collected weekly trend of a number of dishes and ingredients over the past 12 years. These searches were then plotted on a year clock.

The attractive thing about this visualization is that after plotting on a year clock we can investigate the seasons and rhythm of food around the world. It shows that when is the google search for dishes or ingredients at peak, a radius of the clock gives us the number of the google search, The volume of the google searches in particular months highlights the natural season of the food.

The best part is that the visualization has shown which food is at the peak in a month. It shows the natural season of the food including food and veggies.

Also, it shows the most common patterns across the year, i.e, which items fade in and fade out in a year, which food items are at peak during the holidays or special events.

It also shows that the seasonality varies across the world.

 

Reference – http://rhythm-of-food.net/

Image Reference – http://rhythm-of-food.net/

Financial KPI Dashboard For Executive

The first thing that comes to mind when thinking of an executive dashboard is tracking fiscal performance. This dashboard displays financial KPIs like current revenue, quick ratio, and short term assets. The idea behind any executive dashboard is to provide a concise, but accurate view of business performance so executives can get the information they need “at-a-glance” to drive the business forward. In the quest for brevity, however, it is important to provide executives with an easy way to to get a more granular view of the business.

  • Left top corner is the Sales Growth, which measure the pace at which your organization’s sales revenue is increasing or decreasing. This is a key metric for any organization to monitor since it is an essential part of growth projections and is instrumental in strategic decision-making.
  • Left bottom corner is the Quick Ratio, which measures the ability of your organization to meet any short-term financial obligations with assets that can be quickly converted into cash.
  • Right part is about Working Capital, which measures your organization’s financial health by analyzing readily available assets that could be used to meet any short-term financial liabilities.

Reference: https://www.klipfolio.com/resources/dashboard-examples/executive/financial-performance

How to create a simple Bar chart using D3.js? Part 1

For many classes I have sat wondering why should I learn to use D3.js when I can create a simple bar chart in Tableau within seconds. However, after reading a couple of articles online and also a few blogposts from this course I got intrigued and wanted to write on my understanding of the creation of a bar chart using D3.js.

D3.js as we all know is a Javascript library which helps visualize data using HTML, SVG and CSS.

All of us are aware of HTML and CSS. So, what is this SVG after all? SVG is an XML based vector image format with support for interactivity and animation. SVG allows three types of graphic objects:

  • Vector graphic shapes such as outlines (boundary for your bar graph) consisting of straight lines and curves
  • Bitmap images
  • Text

These graphical objects can be grouped, styled and transformed into previously rendered objects. SVG is like your paint tool which draws your chart based on the measurements you give and axis and boundary defined by you.  You can include interactivity with your SVG as well and bring in animation to your chart through javascript that accesses the SVG DOM (document object model) Document_Object_Model.

So to first start your bar graph, you need an svg element to plot it and render it with the available data.

<svg id=”visualisation” width=”1000″ height=”500″></svg>

This code defines the svg element’s border or a small box with the mentioned height and width.

Next step is to create the x and the y axis of your chart for which you need a fixed range and domain for both the axis.

Domain – defines the maximum and minimum value of the data set displayed on the graph

Range – amount of the domain that the svg will be covering.

How will you fix the domain and the range? What sets that minimum and maximum value? The axis needs to scale according to your data and we need to take this into consideration while setting the values for the domain and the range.

For this example, I am using the following dataset:

var lineData = [{ x: 1, y: 5}, {  x: 20, y: 20 }, { x: 40,  y: 10 }, {  x: 60, y: 40 }, {  x: 80, y: 5}, { x: 100,  y: 60 }];

Now that the SVG is defined next step is to define a margin because each HTML element is considered as box on any web page. It consists of margin, border, padding and content. The width and height of the entire bar chart should also be set initially.

var vis = d3.select(‘#visualisation’), WIDTH = 1000, HEIGHT = 500, MARGINS = { top: 20,  right: 20,  bottom: 20, left: 50 },

<!— The x-range and the y-range represent the domain of the x- axis and the y-axis.

The range value takes into consideration the left and the right margin.

The next step would be to tie the domain to the data set using d3. Max() and d3.min() methods. –>

xRange = d3.scale.linear().range([MARGINS.left, WIDTH – MARGINS.right]).domain([d3.min(lineData, function(d) {

return d.x;

}), d3.max(lineData, function(d) {

return d.x;

})]),

yRange = d3.scale.linear().range([HEIGHT – MARGINS.top, MARGINS.bottom]).domain([d3.min(lineData, function(d) {

return d.y;

}), d3.max(lineData, function(d) {

return d.y;

})]),

<! – – Both the axis are appended to the SVG element by use of the append function – ->

xAxis = d3.svg.axis() .scale(xRange)  .tickSize(5) .tickSubdivide(true),

<! – the Y axis needs to be oriented towards the left hence orient function is used. – ->

yAxis = d3.svg.axis().scale(yRange) .tickSize(5) .orient(‘left’).tickSubdivide(true);

Both the axes have been transformed keeping the defined margin in view so that the axis do not touch the SVG margin. I will be continuing this post next week to add the lines of the bar chart and also CSS styling element. Hope this post encourages you all to try a simple chart on D3.js.

Reference : https://www.sitepoint.com/creating-simple-line-bar-charts-using-d3-js/