Data Cleaning tips and tricks for our visualization projects

As many of us have started working on cleaning our data for our project work, I thought of sharing a few tips and tricks that I came across for cleaning your data sets while using Tableau.

  1. Get involved with your data

After you have identified the final argument and the different claims that support your argument, the next step would be to understand your data. It may not be sufficient to just look at the column headers, it’s a good idea to think through what the data represents. Check for the data types, values that each column can take if they are within the expected range (an e.g. price range of fruits per kg in a grocery store from $0 to $40.) Be vary of empty or null values. (E.g. in our first assignment the null values for x Coordinate, y coordinate, latitude and longitude did result in a key insight). You can also try to spot initial patterns in the data set.

  1. Never trust your data at the first sight

It may so happen that the first set of 50 -100 rows of your data may be well formatted however there may be errors in the rest of the rows making it difficult to visualize your data. Also, it’s always better to double check your understanding of the data so that you don’t make wrong assumptions. This will save a lot of initial data prep time.

  1. Avoid cleaning your data manually

It’s always better to use the in-built Tableau Data Interpreter for cleaning messy data sets. It’s an easy way to strip out title, footnotes, empty cells and multi-row column headers and create a usable table. The data interpreter is also very useful in extracting sub tables from excel files. i.e. when multiple table are place on the same sheet and separated by a empty spacing in between.

  1. Standardize your data

Use the same naming convention for column headers across all your data sets. For example, if the same column appears in multiple data sets if you standardize it’s easier to remember the column name. This may apply to the data values as well. For e.g. CA and California are the same but may not be recognized in Tableau. It’s a good practice to group these values. Try to use the same unit for measures that you want to aggregate by applying a calculated field. (for e.g. total number of items sold per category and profit per category in a retail store should both be an integer type)

These data sets cannot be linked without using standards
                                   These data sets cannot be linked without using standards

 

  1. Iterate your data cleaning process

Try to focus on the main issue blockers in the first iteration and start your first data visualization. Based on the insights you get, you can apply better data quality techniques to refine your dashboard to sell your story.

Reference: https://public.tableau.com/en-us/s/blog/2016/05/5-tips-cure-your-data-cleaning-headaches

 

When KPIs fail!

KPIs designed without structure and clearly defined outcomes can lead to a mindless chasing of numbers, resulting in reduced performance. Here are few bad KPI practices:

1)Using KPI as a target:

A well-designed set of KPIs serves as a navigation tool that gives everyone an understanding of current levels of performance. If we use KPIs as indicators used and owned by everyone to identify areas of improvement, then they become powerful enablers of improvement. But if we use KPIs as targets, then we get what we measure, and nothing else. The article uses the analogy of comparing KPIs to torch. When used as a target, KPI will give a spotlight and leave other parts of the room in the dark.

2)Measure everything everyone else is measuring:

Sometimes businesses end up measuring KPIs prompted by external sources or the most recent leadership book. Bad KPIs are detached from business context and as a result are pointless. In contrast, authors of winning KPIs start with an analysis of the business context, thus making their KPIs successful as a
business tool.

3)Not separating Strategic KPIs from other data:

The key message of important strategic KPI is lost when it is lumped together in one long KPI report or a huge dashboard. Business leaders are time-poor and one needs to ensure that the critical KPIs are not lost in a sea of irrelevant information.

4)Hard-wiring KPI to incentives:

When KPIs are linked to incentives, they stop being a navigation tool and become a target an individual should hit to secure a pay rise or bonus. When this happens, individuals involved can become very creative in how they can manipulate the information to ensure they receive the incentive.

Reference:

www.simplekpi.com/Articles/5-Examples-of-KPI-bad-practise
www.linkedin.com/pulse/20140324073422-64875646-caution-when-kpis-turn-to-poison
www.bscdesigner.com/sound-approach-for-kpis.htm
Key Performance Indicators For Dummies By Bernard Marr

Heatmap of Temperatures

In this blog I would like to analyze a heatmap of “Daily Temperatures & Precipitation” of Sacramento from 1900 to 2015. The visualization consists of 3 separate charts in different tabs for high temperatures, low temperature and precipitation.

The first thing I really liked about this chart is tabs for different charts, labeling of axis is neat and not cramped & placement of legend.

I believe this chart is a perfect example of a heatmap.Even though every daily high/low temperature and precipitation is recorded in this chart, because the author choose a heatmap it does not look cramped. We can clearly see the changes of range of temperature not only along the season but also along the different years, we can clearly observe from the High temperatures chart that the High temperatures have been consistently increasing from 1900 to 2015. The high temperatures were highest only in the months of July – August in early 1900s to June – October in early 2000s.

The one thing  I would change in the Daily High Temperatures chart as “Increasing temperature in Sacramento” . I believe that as per Tuesday’s class having a title which conveys a message/claim is better than having a description of the chart as the title.

In conclusion, I think this the heatmap is an excellent in its visualization of  a 100 years of temperature.

Source – http://digitalsplashmedia.com/sacramento-weather-data-visualization/

 

TABLEAU DASHBOARD: Best Practices and Design Principles

While doing my assignment, I was doing some research on how to design a tableau dashboard and what are the key principles for creating an informative dashboard. I found a video explain the Dashboard Best Practices and thought of writing my blog on same.

Building Dashboards involves creativity, science and art and there are 5 key design principles for designing a Tableau dashboard.

  1. Have relevant metrics: You need to have relevant metrics for your dashboard which align to the overall strategic goal. A good practice is to involve stakeholders at an early stage to identify the required metrics. Also, it is good to remember, if it doesn’t get measured, it doesn’t get improved; hence make sure that the selected metrics are the ones which can be improved or on which corrective action can be taken.
  2. Make it visually pleasing, do not overboard
    The idea of the dashboard is to make it easy for the users to compare and remember data. Take advantage of this but do not go overboard with the charts and try to limit between three to five charts in one frame. Too much information can be confusing and detrimental to the viewer.
  3.  Make it interactive:
    Take advantage of the Tableau’s features to create a high level summary of the data but always allow users to explore through the data and get engaged. Give them opportunity to dig to the level of detail to meet their needs.
  4. Make it easy to use and access:
    chart 1
    At this point, it is good to consider things like color choices, fonts, layout and also about access, right. Try to answer following questions: Will people be able to click on it, and immediately access it? Will it be fast? Will it run well?
    Focus should be to make a positive experience for the audience and that they can access and use it easily
  5. Be open to improvement:
    Be open to improvements and try to collect feedbacks. Creating dashboards should be a continuous process. Metrics and goals might change and a good dashboard should be up to date with those challenges and changes so that it stays relevant.

Keeping in mind the above principles can help us in designing a better dashboard.

Reference: https://www.lynda.com/Tableau-tutorials/Creating-visuals/417094/442256-4.html?autoplay=true

Key Performance Questions

This week we discussed KPIs and business metrics in class. Why are they important and how do they help?

For companies to gain competitive edge and knowledge from data they need to completely understand and define business objectives. These business objectives become the underlining principles on which business metrics and KPIs are defined and dashboards can be designed. These help the companies to achieve their goals by providing them a direction and guidance.

But how do we make sure that we have the right KPIs?

Many a times people pick KPIs randomly and later on realize that they are not quite right for them. Otherwise they pick too many KPIs just so that “all angles are covered,” which leads to confusion as to what exactly are the performance drivers. Hence, it is very important to decide on the right KPIs that matches strategic objectives.

I landed on this article by Bernard Marr who has developed an approach to bridge the gap between strategic objectives and KPIs which is called Key Performance Questions (KPQ). This basically is a simple approach which requires you to identify performance related questions that you need to answer before defining a KPI. Once these questions are decided, the management can then ask themselves what data and information we need to answer these questions?

We can also use the same for our projects and assignments. Before designing a dashboard, ask yourself questions. We can start with high-level and generic questions which can later evolve into more specific and detailed ideas. Having these questions will give more clarity to what we plan to achieve.

Some examples can be –  What is the key focus? What value am I trying to bring from this visualization? What is the goal and action? What will be the impact? What data will best represent the case? I believe that the concept of 5-WHYs can also be of help here. It will help provide us the right direction and create better visualizations!

Source – https://www.linkedin.com/pulse/20140814161947-64875646-what-the-heck-is-a-key-performance-question?trk=mp-author-card&trk=mp-author-card

 

 

 

Identify True Factors That Lead To Success

Key Performance Indicator is a measurable value that demonstrates how effectively a company is achieving key business objectives. Companies use KPIs to evaluate their success at certain action. Well designed KPI dashboard provide greater structure and context to the organization, and let them know how performance of certain KPIs impacts other KPIs. However, identification of right KPIs for the business is challenged. There are three bias that may lead to ineffective KPIs:

  1. Overconfidence: people are so confident in their judgments that their abilities are in conflict with the reality. For example, the managers of a fast-food chain, found customer satisfaction highly relating to profitability and believed low employee turnover can keep customer satisfaction, but when they made effort lowering overall turnover rate it didn’t help. The truth is that turnover only is relevant with manager position.
  2. Availability: people assess the cause or probability of an event on the basis of similar examples coming to mind, follow certain pattern, and overestimate other important information.
  3. Status quo: most people would stay the course rather than face the risks that come with change. Executives would stay on existing metrics instead of changing to suitable ones.

How to avoid those bias: just like designing dashboard, First, define the objective. Second, develop cause and effect, and identify the drivers of objective. Third, identify the specific that the audience can do to achieve that objective. Last but not the lease, regularly reevaluate the statistics.

 

Reference:

https://www.smartsheet.com/all-about-kpi-dashboards

https://www.klipfolio.com/resources/articles/what-is-a-key-performance-indicator

https://hbr.org/2012/10/the-true-measures-of-success

 

 

 

Modern Approach for Data Visualization

There is a variety of conventional ways to visualize data such as bar charts, pie graphs, and pivot tables. Actually, we have some other creative options to visualize data which is a lot more fun. This blog will show some example of this amazing ideas of visualizations.

  1. Trend map

The trendmap presents the most popular website under different categories and how they link to each other.

2. Visual hills for density

The graph above uses visual hills (spikes) to emphasize the density of American population in its map. It is clear that the population density is high in the northeast are and Chicago.

3. Heat map

This visualization uses heat map to show visitors behaviors. Sectors highlighted with more “warm” color are more popular, which means visitors click them more often. It’s an interesting and more straight forward way for web analysis.

Reference: https://www.smashingmagazine.com/2007/08/data-visualization-modern-approaches/

Valentine’s Day spending by Americans

The most loving day of the year was celebrated this week: Valentines Day. The spending on cards, overpriced flowers, chocolates, chilling champagne and the fantastically romantic dinner date is done. Lets just get a sense of  how expensive Valentine’s Day can get. Below visualization depicts the Valentines Day spending by Americans. 

What I like about this Visualization is

  • Color that matches the theme
  • Precise titles show what we are about to see
  • Nice description which shows us the goal
  • Donut chart works well here as it’s only 2 slices

Possible improvements:

But to reach our goal and take proper action, there is very little context. We cannot figure out if this spending is increasing or decreasing as compared to previous years. Historical spending’s might help in getting a proper picture.

As discussed in class regarding grouping the significant attributes which does not have much difference amongst them, we could make two groups: significant other and everyone else.

Use of bubble charts to compare the sizes of the spending could be replaced by a simple bar graph. It will be easier to read. Though the color matches the theme but this is a lot of pink.

The data seems incomplete since it only shows spending on gifts but not the other expenses of flowers, chocolates, holiday, dinner etc. which are overpriced during Valentine’s Day.

I felt the below link depiction of Valentine’s Day spending to be better and simple:

https://nrf.com/resources/consumer-data/valentines-day

But overall we can say that love is not likely to be a cheap thrill on Valentine’s Day.

Sources:

http://www.karbelmultimedia.com/2015/02/valentines-day-spending-infographic/

https://nrf.com/media/press-releases/cupid-shower-americans-jewelry-candy-this-valentines-day

 

World’s Biggest Data Breaches

Data breaches are highly damaging for both the company and its consumers. This interactive bubble chart depicts the biggest data breaches that occurred. The bubbles represent the different companies which faced data breaches. This visualization has a time scale as the y-axis where the breaches are categorized according to the year it occurred. The visualization also provides more elements to filter and categorize the data. For example, the bubble color and the bubble size which have 2 mutually exclusive indicators called ‘year’, ‘method of leak’ and ‘no of records stolen’, ‘data sensitivity’ respectively.

Things I liked:

  • Firstly, by the time scale, we can easily identify that the number of data breaches has drastically increased over the years which raised a lot of concerns.
  • Secondly, the visualization portrays a complete picture of the data breaches and covers every aspect ranging from the method of the leak to the sensitivity level of data.
  • Thirdly, hovering over each bubble provides details of the breach and on clicking first time it provides a summary of the event, but on clicking the second time it redirects to the actual news article.
  • We can select any combination of the four categorizing indicators mentioned above.
  • A legend provides filtering options based on the type of industry and the type of data leak.

Things that can be improved:

  • The color range used for depicting ‘year’ is very subtle and distinguishing is difficult.
  • Attention diverts to the ones in orange which is predefined as an interesting story.

Reference: http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/

 

Why do I recommend Gapminder Tools Offline

 

Hans Rosling’s videos were amazing. He used Trendalyzer to told us how to present interactive visualization in a different level. Today I’d like to write why do I recommend Gapminder Tools Offline- upgraded version of Trendalyzer.

  1. Be able to easily bookmark. With Gapminder Tools Offline users not only can create statistic animation but it also allows users to book mark the specific time that they want to book mark. I like this function because it can make my presentation more straightforward and I can show important factors like KPI without digging different story pages.
  1. Interactive bubble presentation with color. With this software, users can easily generate interactive moving bubble charts with vivid colors. It looks like bubbles flying over the sky and it gives audience a friendly scene during the presentation. It can save both presenters and audience time because the moving bubble itself conveys a historical trend itself.
  1. Offline tool. The software note only offers online tools but also offline tools. It allows users to prepare the presentation without Internet restriction. Because your boss might ask you to present a story while you traveling where does not have Internet.

Reference:

  1. The best stats you’ve ever seen
  2.  http://www.makeuseof.com/tag/awesome-free-tools-infographics/