Lie with Truncated Y-Axis

Data visualization is one of the most important tools we have to analyze data. But it’s just as easy to mislead as it is to educate using charts and graphs. In this article we’ll take a look the most common way in which visualizations can be misleading.

Truncated Y-Axis

One of the easiest ways to misrepresent your data is by messing with the y-axis of a bar graph, line graph, or scatter plot. In most cases, the y-axis ranges from 0 to a maximum value that encompasses the range of the data. However, sometimes we change the range to better highlight the differences. Taken to an extreme, this technique can make differences in data seem much larger than they are.

Let’s see how this works in practice. The two graphs below show the exact same data, but use different scales for the y-axis:

On the left, we’ve constrained the y-axis to range from 3.140% to 3.154%. Doing so makes it look like interest rates are skyrocketing! At a glance, the bar sizes imply that rates in 2012 are several times higher than those in 2008. But displaying the data with a zero-baseline y-axis tells a more accurate picture, where interest rates are staying static.

https://blog.heapanalytics.com/how-to-lie-with-data-visualization/

CORE PRINCIPLES OF DATA VISUALIZATION

In a beautiful and apt analogy, Stephen Few, the Principal of Perceptual Edge encapsulates the purpose of data visualizations – Visualisations are just tools. Just as tools made it easier to build houses, visualization make it easier to portray complicated data. The purpose of visualizing data is served if and when it takes the burden of effort off the brain and puts it on the eyes. In order to accomplish that, he recommends a few core principles:

SIMPLIFY – Perhaps the most important of all, simplification of complicated data should be the first purpose of any data visualization. Choosing the apt type of representation and adding only the most essential/important of data helps in simplifying the complicated data. Care should be exerted to make sure that this does not come at the cost of oversimplifying and omission of important data.

COMPARISON – When using data visualization techniques to compare and contrast two sets of data, it is essential to juxtapose them to offer an easier comparison. Human brain finds it difficult to keep the comparison data in memory and it is cumbersome to have to turn back every other second for comparison.

ATTEND – Visualisation of data should be done in such a way that the audience process the most important data at the first glance itself. Highlighting the important part of the data or using tools and techniques to emphasize the principal parts would serve this purpose admirably.

EXPLORE – A good data visualization should enable the viewer to gather the data it was meant to portray just by looking at it. It should also be flexible enough to allow a directed or exploratory analysis with ease. A good visualization tool must be designed keeping this in mind as well.

DIVERSITY – A data visualization may have different facets to the data it is representing. Different views of the same data may provide different insights. The quality and efficiency of data visualization increase if it allows for the same data to be analyzed from different perspectives and to see the relation between them.

THE WHYs OF DATA – In addition to showing the data, an ideal data visualization should also encourage the viewer to find an answer to the reason/cause behind the distribution of data.

SOLUTION TO POSSIBLE QUESTIONS – Most of the time, viewers take in the data given in the visualization without any question. But a good data visualization should be designed in such a way that it would be able to answer any potential questions. This could be achieved by incorporating more filters and/or software.

In Stephen Few’s words, the best software is the one which you don’t realize that you are using. By putting these basic principles in practice, anyone can design such an efficient data visualization tool.


 

How the Recession Reshaped the Economy

The chart in this visualization appropriately shows that, five years since the end of the Great Recession, how the American the economy has regained the total of nine million jobs it had lost during the economic recession. The major part of this economic recovery was that not all industries recovered the lost jobs equally. Each trend line below shows how the number of jobs changed for a particular industry over the past 10 years. The claim of this visualization is that how the recession reshaped the nation’s job market, industry by industry.

Overview of the visualization

  • The creator of this visualization has categorized the jobs as per industry.
  • The legend in the plot appropriately shows the consequences of the jobs for about a decade. It has covered a vast range of jobs that were recovered and grown, jobs that were recovered to jobs that were declined due to a recession.
  • The X-axis shows the change in wages by industry and the Y-axis shows the change in the number of jobs since recession by industry.
  • The line plots accurately do this explosion thing showing breakdowns and highlights in the data.
  • On scrolling down, we get to see charts showing the trend of the job market by industry. The creator has drilled down to show the underlying information in detail and each chart shows that whether particular jobs of an industry have recovered and grown or only recovered or declined etc. Each chart shows the number of current jobs and the average salary for that job. On hovering over the line chart we get to see a number of current jobs and average salary changing as per the time period.

 

Conclusion

The created visualization follows the following principle of the visualization

  • Overview first
  • Function First, Form Second

Charts like this is helpful to predict the next financial crisis and build programs for recovering economy.

Reference – https://www.nytimes.com/interactive/2014/06/05/upshot/how-the-recession-reshaped-the-economy-in-255-charts.html

Benefits of Dashboards in the Business World today

Dashboards are extremely important in today’s businesses. Below are some of its benefits:
1. Total visibility into the business: You will know what exactly is going on in your business at all times with the use of a dashboard. How good were the sales last quarter/year, how is the marketing going on, how is the response of customers on the new product, etc. It becomes easy to compare such trends in the business.

2. Big time savings: The reports generated can be automated and live results can be seen. This saves a huge amount of time. This time then can be used for other useful purposes.

3. Improved results: You intuitively start improving your results once you see your key metrics on the dashboard. You start working better, trying to make that sales/profits graph go up.

4. Reduced stress: You can scan every aspect of your business to see how you are doing. If there’s a problem, you’ll know who exactly to contact to fix it. This increases easiness and reduces stress.

5. Increased productivity: You can measure performance numerically. When the employees see the results numerically, they naturally work hard to improve them. They try to make sure that they don’t have red arrows anywhere (a mark to show failure/doing poorly)

Reference:

6 Benefits to Building Your Dashboard Today

A Virtual Reality Guided Tour of 21 Years of the Nasdaq

The virtual reality tour of market data allows readers to literally ride the Nasdaq stock exchange through 21 years of growth and collapse. The “roller-coaster” conceit paired well with the Nasdaq data as it rose through the dot-com boom of the late 1990s and then busted. The slow recovery over the next 20 years culminated when it surpassed its previous peak, which is when this project was published. The visceral sense of height helps readers understand the precarious nature of the dot-com boom, and the plummet thereafter allows users to experience a sense of fear and uncertainty.

This project uses true 3D to allow users to experience an immersive world populated with this data visualization. users can optionally attach their phone to a Google Cardboard or any other 3D viewing device for a completely immersive experience that tracks your head movements and provides slightly different images to each eye, simulating real 3D. Without an attachment, readers can still move their phones in 3D space to view the 360-degree world. On the desktop, they can click and drag their mouse. Holding your gaze on a button triggers the action, allowing readers to bypass more complicated clicking interfaces.

The project is built using three.js, a relatively new library that allows programmers to render three-dimensional content in the browser. The data visualization itself was powered by D3.js, which was fed into the 3D environment.

Reference: http://graphics.wsj.com/3d-nasdaq/

Using interactivity to explain a complex topic: Why Buses Bunch

We have seen countless examples of visualizations that represent business data i.e. they show metrics or provide a viewpoint. However, another very important use of visualization is visual discovery. We can use visualization to explain complex topics using techniques such as storytelling or gamification. One such example is the visualization that I am talking about in this blog. The visualization here explains what “bus bunching” is using a simple visual game with analytics. Before I critique the visualization, let me explain what “bus bunching” means. This phenomenon occurs when there is a delay in the arrival of a bus, followed by multiple buses to arrive in quick succession later.

What I liked in this visualization?

  • The example taken to explain this topic is a very simple one – two buses and four bus stops. The start point in this visualization is something which any one with zero or little knowledge about the topic can understand. However, the author has provided options to complicate the scenario by adding interactivity. This is achieved by clicking on the interactive dashboard which shows the bus number and the passenger count.
  • The user of this visualization can also get information regarding history of passenger wait times by viewing the area chart which appears when we hover over any of the bus stops.
  • The instructions are laid out in a clear and concise manner, without disrupting the user’s attention.
  • There are also interactive provisions to play/pause/reset the phenomenon.

What I disliked in this visualization?

  • The visualization does not help account for passengers in/out. The count of passengers at bus stops usually varies, but this has not been accounted. Hence, we can only see the phenomenon of bus bunching.
  • The bus stops have multiple data points (in circle shape) that come up when we play this game. However, it is not clear as to whether that is the total number of passengers that get in or out at that stop.

I would like to mention that these kind of visualizations should be used for educational purposes in order to simplify complex topics. For these kind of visualizations, the selection of visualization tool is very important since traditional idioms, such as line/bar/area chart, may not work and it is very challenging to create custom idioms. For such scenarios, we have visualization languages such as D3.js to help in creating innovative idioms like the one explained in this blog.

References:

http://setosa.io/bus/

Should I learn more visualization tools?

In the recent years, the internet has seen a surge in the number of dashboards and visualizations – some that communicate a message succinctly yet far too many that are colorful and pretty but don’t serve their intended purpose all too well. Thanks to the availability of data and the proliferation of “big data” visualization tools, it has become extremely easy for data enthusiasts – both amateurs and experts alike to create dashboards and share them across the web. While the growing enthusiasm about data visualizations is definitely encouraging, it is important to better understand the framework that underpins a great visualization to make a meaningful impact. The problem begins when we skip learning concepts and jump directly to learning tools.

Personally, until a few months back, I was under the impression that I could learn the art of visualizing data by simply learning a tool such as Tableau or QlikView. However, over the course of the past few weeks, I have come to realize how important it is to understand the underlying foundations and frameworks to create an effective visual that truthfully communicates a trend or a claim. For example, I’d never given too much thought into who my audience was and how my visualizations were driven by them and not the other way around. It is worth noting that the fundamental concepts of visualization and framework remain the same, no matter what tools we use. Hence, tools without frameworks are mere tools that serve little purpose.

Reference:

www.daydreamingnumbers.com/blog/learn-concepts-not-only-tools/

5 TIPS FOR CREATING EFFICIENT WORKBOOKS

  1. Think strategically about the data you absolutely need. Reduce the size of the data set by removing irrelevant data from the file. For example, dropt the data three years ago, if your analysis is only about current year. This may help you remove at least one-third of the data before you started. You may also use aggregation to reduce the number of records. Prepare data before it gets to Tableau.
  2. Limit filters. One way to reduce filters on a dashboard is to use dashboard actions instead. Using a sheet as a filter or adding a filter dashboard action that runs on hover or select provides a more efficient means for filtering the rest of the dashboard.
  3. Reduce the number of marks. The more marks that need to be processed, the longer it may take for the visualization to appear. Depending on your analysis requirements, it may not always be possible to reduce the number of marks on a view, but sometimes there is an opportunity to change the level of detail to improve efficiency. Consider ways you can aggregate data points into hierarchies and / or make the analysis less granular.
  4. Efficient Rule. While the calculations are very powerful, they can come with a cost to the efficiency of the workbook. Not all data types are created equal in terms of efficiency, with data types going in this order from most efficient to least efficient: Boolean > Integer > Float > Date > Date Time > String
  5. Reduce sheets, dashboards, data sources. This tip not only helps with efficiency, it will help you keep your sanity and improve the end user experience. If you do have several dashboards that are connected, create a navigation dashboard that helps the end user locate the most relevant views for their specific business questions. This same technique can be used from within specific dashboards (i.e. add a URL action to run on Menu that links the end user to another dashboard / additional information).

Reference: http://www.evolytics.com/blog/tableau-201-5-tips-creating-efficient-workbooks/

Ways not to use Tableau

Till now we have seen many ways in which Tableau can be applied to Data but Users tend to utilize it in the wrong manner and expect it to perform operations which are not a part of its applications. Let us look at some of them and identify ways to avoid it.

  1. Using Tableau as an Online Excel: Many users try to convert Excel spreadsheets to Tableau worksheets and publish it on Tableau Server so that other users can interact with the data using filters. But all the features of Excel cannot be replicated easily into Tableau so the users tend to blame Tableau for not having the necessary functionalities. Tableau and Excel are not built to do the same work and therefore have different and compatible features. Solution: Static table-like reports should be created using Traditional BI Tools and the dynamic visualizations and dashboards should be generated using Tableau.
  2. Building business applications on top of Tableau: Tableau is not a document or project management tool or a collaboration system for applications. Solution: Tableau should be utilized for Data Analytics and Data Visualizations and Development Team should be used for creating custom applications.
  3. Using Tableau desktop as an ETL Tool: Users export data from Excel file, do the calculations that are easier in Tableau and expect to import that Tableau data back into an Excel file and analyze it further. This is not possible and it is seen as a shortcoming of Tableau. Solution: ETL should be executed using tools like Alteryx, Informatica, Microsoft SSIS and Pentaho and Tableau Users should stick to Data Analytics and Visualizations.
  4. Exporting Tableau dashboards to PDF or Image: Users export the dashboard as a PDF or an Image to include it in a static text document which makes it lose its interactivity. Solution: To retain the interactivity of the dashboards and share it, use Tableau Server or Tableau Online to avoid pitfalls in decision-making.
  5. Unlimited Tableau Reader Users: Analyst Users tend to share Tableau workbooks in a production environment on day to day basis with many users. This involves company-specific data and has the risk of leaking outside the company. Every day the data is refreshed so the analyst has to send the updated workbook again which makes it a cumbersome task with many risks. Solution: Tableau Server and Tableau Online should be used to publish and share interactive dashboards and to avoid the risk of leaking data.

Tableau is not a data creation and a table production tool and should not be used for modifying or modeling data. Tableau users should connect it to raw data and harness its capabilities to produce dynamic visualizations and dashboards using suitable Data Analytics.

Source: https://www.linkedin.com/pulse/five-reason-how-you-should-use-tableau-hrvoje-gabelica

 

Visualizing Big data in Healthcare using Circos

The healthcare field has always been a favorite for analyst and data experts. The field is abundantly rich with large data sets and values, what we now call the “Big Data”. Analyzing large and multi-dimensional data in the field of genetics, genomes and biotech has always been a challenging task for analysts.

Most of the times, clinical data sets have a lot of fields and unstructured data. Healthcare data visualizations can be tricky and need utmost care in selecting the relevant data fields, measures and indicators. Because there is so much data involved, its structuring and visualization is a challenging process in itself. In a pile, full of insights, understanding the business needs of the clinical data and presenting them becomes difficult for technical experts like us. Data could be of various types like DNA types, Gene types, genome classification, disease virus classification etc. The person involved in creating visualizations and analysis may or may not have been acquainted with these terms biological and its significance. Hence, analysis of healthcare data becomes even more difficult. With various tools and software available in the market for data visualizations, one of them has stood out in terms of health care data.

Circos is an open source software package for visualizing data and information that visualizes data in circular layout is mostly advertised for data visualizations that have complex relationships between objects or positions. Circos is ideal for creating visualizations and illustrations with a high data-to-ink ratio [1] and multi layered data attributes making it ideal for clinical data analysis. Thus, for a data science professional in the field of health care and biotechnology , Circos is touted to play a very important part in making their tasks simpler.

References: http://circos.ca/

http://www.mastersindatascience.org/blog/10-cool-big-data-visualizations/