Insights in the quality of the data is what matters

Data visualization attracts lot of attention because they are a finished product, and look nice as well. However, for many companies engaged in data visualization, those final deliverables aren’t the most important benefit of data visualization. Instead, it’s the insights into the quality of their collected data that truly leads to success.

Data visualization provides 3 key insights into data:

Insight into Data #1: Is the data complete?

The most straightforward insight that visualization can give you about your data is its completeness. With a few quick charts, areas where data is missing show up as gaps or blanks on the report (called the “Swiss Cheese” effect).

In addition to learning which specific data elements are missing, visualizations can show trends of missing data. Those trends can tell a story about the data collection process and provide insight into changes necessary in the way data is gathered.

Insight into Data #2: Is the data valid?

Visualization plays a pivotal role in understanding data’s validity. By executing a quick, preliminary visualization on collected data, trends that indicate problems in the complete data can be found.

Insight into Data #3: Is the data well-organized?

Poorly organized data can be the bane of the final step of a data collection or business intelligence process. Using data organization tools from the start can help streamline later steps of the process.

During collection, the data is often organized in a way that optimizes the gathering process. However, that same organizational scheme can be a problem when the time comes to act. The data visualization process serves to highlight the organizational challenges of your data and provides insights into how it might be done better.

Source: http://www.boostlabs.com/benefit-of-data-visualization-3-crucial-insights-into-your-data/

Budget Puzzle: Empowering the people with Data

In the democracy, though the masses are bestowed with the power to choose, the empowerment becomes futile with the lack of information provided about the choices given. People have little or no idea about the policy making processes, the budgeting process, etc.

In line with the global trend of democratizing access to information and empowering people, this data visualization does an excellent job of demystifying the process of balancing the national budget. By placing budget balancing in the hands of everyday users, this visualization taps into the power of collective thinking to solve big problems.

In this interactive visualization, readers are asked to come up with a set of cuts that would sufficiently reduce the deficit for both 2015 and 2030. Some policies save more money in the near term than others, and some policies, have much more long-term savings.

The country’s ultimate deficit solution will have to include a mix of medium- and long-term savings.The spending options are broken into four categories: domestic programs and foreign aid; the military; health care; and Social Security. To achieve these savings, readers are asked to choose a mix of tax increases and spending cuts.

Visit this Link – BUDGET PUZZLE 

Bike Share in Philadelphia

This visualization is about the data of bike share station usage of Indego, which is a public bicycle sharing system that serves parts of Philadelphia at over 100 stations.

The purpose of this visualization is to help people to plan their commutes and avoid the busy hours in local bike stations.

As is shows, the x-axis represents different time within a day. The Y-axis is more complex, which represents the percentage of time the station is full or empty during that moment.

The graph itself only using a mark and two channels, with the mark of area and the channels of Color and Color saturation. However, the using of channel might no appropriate here. it might confuse people as it using deeper red to represents there are more bikes available. Because in our daily life, the color red always represents the signal of congestion or insufficience. Also, it lacks information density. For example, it could not present the difference in bike usage in each day within a week.

In fact, the author creates this graph because he thought his inital visualization is imperfect. But I don’t think so after digging out it. Here is his previous visualization.

Reference:

http://www.randalolson.com/2015/09/05/visualizing-indego-bike-share-usage-patterns-in-philadelphia-part-2/

http://www.randalolson.com/2015/07/18/visualizing-indego-bike-share-usage-patterns-in-philadelphia/

Spurious Correlations

In statistics, spurious correlations is a mathematical relationship in which two or more events or variables are not casually related to each other, but it may be wrongly inferred they are, due to either coincidence or the absence of third reason.

It’s well known that correlation doesn’t imply causation. However, when lines, bars, and points have similar trend, we start to believe that one may be the cause and one may be the result.

There are several ways would cause spurious correlations:

  1. Axis scales: either x or y axis scale that measures different values can’t be paired in a single graph especially those showing similar curves.
  2. Change scales: although x and y axis measure same value, the scale of either event change and the proportion and range is different. The graphs below obvious show that in different range those two events highly relate to each other. However, in same range, those two events are irrelevant with each other.
  3. Ifs and thens implying cause and effect: comparing two unrelated data sets together may lead to a misunderstand of causation. We can use to different present skills to examine the causation:

If Pandora loses less money, then more music is copyrighted.

However, this graphs doesn’t show that correlation:

Reference: https://hbr.org/2015/06/beware-spurious-correlations

Potential Data Resource from Uber Movement

Uber lately introduced Uber Movement, a website that uses Uber’s data to help urban planners make informed decisions about city. With this website, also we could call it a data analytic platform, local leaders, urban planners, and civic communities are easier to work on cracking their city’s commute and figure out how best to invest in new infrastructure.

This website would help us reliably estimate how long it takes to get from one area to another. Also, we can compare travel conditions across different time of day, days of the week, or month of the year-and how travel times are impacted by big events, road closures or other things happening in a city.

Uber claim the data is anonymized and aggregated into the same types of geographic zones that transportation planner use to evaluate which parts of cities need expanded infrastructure without release any personal privacy information.

https://movement.uber.com/cities

https://www.wired.com/2017/01/uber-movement-traffic-data-tool/

What does a day in the life of an average American look like?

This stunning dynamic visualization depicts the average life of an American based on the data collected in 2014 from The American Time Survey that has been created using CSS and D3.js.
Each dot represents a person, different colors represents different activities like sleeping, leisure etc. There is a time tab at the upper left corner and speeds of transitions like slow fast and medium. Every time a person changes a task or activity the corresponding dot will move from one activity to another.
The day starts off start slow as we see more people are sleeping or just beginning their daily chores. It then moves quickly to peak rush hours where people are travelling to work. The day in this dynamic visualization starts at 4:00am and runs for 24 hours. It is an excellent example of how dynamic visualizations can simplify a task where we must keep a track of thousands of people and their daily activities which is a strenuous task using conventional visualization methods like bar graphs or pie charts.

Also, this is a good example for understanding how each individual from a data set affects the whole pattern. For example, if 200k out of 1000k people start travelling at 9 am instead of 8 am, there is a monumental change in the trend. The example also contains mapping between activities using lines to better understand the transitions from one activity to another and understanding the most common ones like ‘household care’ to ‘personal care’ to ‘eating and drinking’.

Reference: http://flowingdata.com/2015/12/15/a-day-in-the-life-of-americans/

Introducing My Dashboard

Hey guys, how is your weekend? For me, I have finished my very first dashboard.

This is a static single web page written in HTML, SCSS and ReactJS,  it’s built with webpack and hosted on Heroku. The source code can be seen from here. Also, the images used in the web page are stored in AWS S3 buckets. You can clone my git repository to your local server, and start the code the see how it looks. The instruction is in the “README.md” file on the git repo.

The idea to use ReactJS instead of plain JavaScriptis that, we can build modules separately, and reuse different modules, which will facilitate web development in large scale. ReactJS also has many other advantages, like virtual DOM, great documentation, relatively stable and so on. Also by using SCSS, we gain benefits from OOCSS, the philosophy to develop responsive websites.

After launching the webpage, you can see that I use the percent circle mentioned last week. The elements below those percent circles are still images, next thing I will do is to change them into D3 DOM elements. This would take some efforts and I’ll offer the detailed instruction of the process. Please stay tuned for new improvements and feel free to reach out to me for suggestions, ideas and anything! Thank you! 😀

 

Top 5 Tips for Getting the Most Out of Your Tableau Dashboard

Tableau is widely used to create visualizations and dashboards to analyze various kinds of data and derive useful results. There are a few simple tricks to make the tableau dashboard more efficient:

  1. Do not join different types of sources: While creating visualizations we tend to join different data sets. It is important to consider the file types while performing joins. Joining a database to an Excel file and then to some other tableau extract. This can be hazardous for tableau performance. It takes a lot of time to fetch results from different file types and perform calculations based on it. It is crucial to not join too many different file types for better performance.
  2. Calculations should not be too complex: If the data needs to be processed by performing way too many complex calculations, it should be done in the source file and not tableau. Tableau calculations should be kept short and simple. It helps to perform calculations faster and retrieve results in a short time.
  3. Reduce number of report in the dashboard: Too many reports in a dashboard takes a lot of loading time. The number of reports in a dashboard should be precise and too the point. There is no use in creating a new report to visualize every small detail. The reports should be made efficient by displaying right amount of information. This helps improve tableau performance.
  4. Do not bring unwanted data: Before loading the data into tableau, it is important to performs checks whether all the data is necessary for visualization. If some columns or rows are not needed for the visualization, they can be filtered out before bringing them into tableau. Too much of data results in increase in parsing time which reduces tableau efficiency.
  5. Use Parameters instead of  Quick Filters: Using parameters helps reduce load time of the dashboard. The user can insert the values in parameters to see results related to the input. Quick filters displays all the possible results a user can see but it leads to increase in loading time of the report.

By following these simple rules, efficiency of tableau dashboard can be increased.

Reference: https://www.excella.com/insights/top-5-tips-for-getting-the-most-out-of-your-tableau-dashboard

Playing with data to get different interpretations

Last month we created the visualization to depict the daily volume of speed violations that have occurred in Children’s Safety Zones for each camera in Chicago. I tried to see number of violations in each area in Chicago from 2014 to 2016.

There were few areas where the speed violations were massive as compared to other areas. Those speed violations kept on increasing in higher numbers for next year. Also there are areas like “4843 W Fullerton” where number speed violations increased from 2014 to 2015 by 82211 but in 2016 it decreased by 15843 which shows us the efforts of the traffic police department to decrease the speed violations.

Below is the link which shows the changes in amount of violations in Chicago from 2014 to 2015 and from 2015 to 2016.

https://drive.google.com/open?id=0B8ffu231haBVeHdkYVo0d1YzdDA

In order to get the picture of increase and decrease in number of speed violations in every area following steps were carried out on Tableau:

  1. Get the number of speed violations separately for each year based on Violation Date field. To get only the number of speed violations for particular year, I took three calculated fields each consisting the violations of year 2014, 2015 and 2016.
  2. To achieve this from violation date I extracted only year using YEAR function. For example to get the number of speed violations for year 2014, Violations_2014 calculated field was created with code:  If YEAR([Violation Date]) == 2014 THEN [Violations] END.
  3. Once we have the amount of speed violations for each year separately then we can take the difference between any 2 years to get the depiction of increase or decrease in the number of speed violations.
  4. To see the amount of increase and decrease in year 2015, we can create calculated field as Difference_2015 with code: SUM([Violations_2015]) – SUM([Violations_2014])
  5. Step 4 is to be repeated to get the change in speed violations for year 2016 with code: SUM([Violations_2016]) – SUM([Violations_2015])
  6. Once we have the amount of change in speed violations for each year we can plot the graph with the Address field on Y-axis and change in speed violations on X-axis.

This can be done in different ways in Tableau. I followed this approach to get the detailed step by step understanding of the data. Your comments are welcome for any alternate or better approaches to get the same visualization.

Why does your audience matter in data visualization?

Typically a resume is only a single page as recruiters have only seven seconds to skim through and decide if you could be a good fit for the company. On a similar note, if your audience is the executive board they have very limited time to glance through your viz, which means you need to present accordingly. Secondly, consider the amount of information your audience already have and how effectively you can discover new insights, answer their questions and objectify the argument. A chart designed for your manager may not be applicable for your customers. Hence, it’s always better to understand your audience before designing your visualization.

One interesting technique which can help in our story telling project would be to break up our charts into several slides while presenting and finally show a combined dashboard collating all the sheets which can portray our story better. This storyboarding technique ensures that your audience looks at the right chart when you want them to. Another good technique for a smaller audience is to draw attention to key charts by giving curated handouts which can be saved for future reference. This is a great way to keep your group engaged through your presentation.

Reference:  https://www.techchange.org/2015/05/21/audience-matters-in-data-visualization/