nskrithika12 – Dashboards, Scorecards & Visualization

‘Save the pies for dessert’

We all stress about money, so why don’t we talk about it? Three out of four Americans regularly stress out about money. That means that at any given time, the majority of us might be spinning in our heads with worry, shame, and anxiety about the very same thing—but none of us is talking about it.

In technology companies, the salaries are sky-high. Companies like Google, Facebook, Apple, Uber have set a trend in offering their people, on average, six -figure paychecks.

Let us focus in detail on the pay composition of the key roles in the tech giant Google.

View post on imgur.com

What do I like about the pie-chart?

Labelling: The labels are explicitly mentioned and this takes very little effort from the reader to match the slices of the pie to the text.
Colour coding: The Visual property seems to be very appealing and effective, a “color-blinded” reader can also interpret the information from the pie and colours alone do not have a meaning of their own on the pie chart.

What do I not like about the pie-chart?

Complexity: Pie-charts are poor at communicating data, they take up more space and are often difficult to read. Research suggests that it gets very difficult to the reader to compare the size of the angles when there is no scale present, interpreting the accurate data is complex in the above figure, with too many arrows pointing in different directions, this seems to be a herculean task.
Not-a-proportion of the whole: Pie charts are usually used when different slices of the pie combine to form a whole. In the above chart, the slices represent salaries of disparate positions at Google and the sum of the parts essentially do not add any value.
Overlapping slices: The overlapping slices confuse the reader, it takes some effort from the reader’s end to understand the slices of the pie and the underlying pies data is not easily decipherable. For example, the salaries of the staff user experience designer, Engineering Director cannot be understood without effort. The underlying slices data is fuzzy.
Too much Information overload: Research suggests to never have any more than 7 categories in the pie-chart as it becomes harder for the eye to distinguish relativity of size between each section. In the above pie-chart, the author has used around 10 slices and this make the chart cluttered and hard to distinguish, because of multiple categories, it becomes hard for the reader to identify the proportions correctly, compare different categories and gain any insight from the picture.
Less effective: One of the objectives of a visualisation is to present information in a way that can be quickly read and easily understood. If you glance at the above chart too quickly, the chart does not deliver the information in the most effective manner.
Missing Timeline: The chart does not appear to have a timeline. Absolute time adds a lot of meaningful information to visualised data and the reader is deprived of this information.

How did I Re-design this?

1. A range of values: The salaries are a range of values and hence box plot would be an ideal visualisation, the box plot gives us the Highest salary in the group, the lowest and the median. So, the reader can have a comprehensive view of what each position in the company has to offer.

2. Comparison of data: By using a box plot the comparison of salaries among the positions is easier than that of pie-charts which had angular views for the salaries.

3. More effective: The reader can get the gist of the visualisation with ease and can quickly identify the salaries corresponding to the positions which makes this an effective visualisation.

View post on imgur.com

https://public.tableau.com/profile/publish/google_salaries/Sheet1#!/publish-confirm

References:

https://www.quora.com/How-and-why-are-pie-charts-considered-evil-by-data-visualization-experts

http://www.insightsquared.com/2014/02/why-pie-charts-are-the-worst/

American Dream disappearing before our eyes.

Housing in the richest American cities is increasingly becoming unaffordable to the American middle class occupying these cities. Income tends to remain stagnant while home prices are on a steady increase.

The article (https://www.theatlantic.com/business/archive/2014/10/why-are-liberal-cities-so-unaffordable/382045/) attempts to establish a relationship between the median house hold income and the percentage of home affordability by the middle class in the metropolitan cities

The problems with the graph:

Line graphs are good at showing trends; they declutter the graph and provide a visual that emphasises on the trend as opposed to individual data points. The consequence of this property is that the contribute to loss of visual information when the inspection of data points are actually necessary. In the graph under consideration, the aim of the author is to match the increasing pay trend in the richest metros against the decreasing affordability of housing. What the author ends up losing in the graph is actual information about median household income and percentage of homes reachable to the middle class.
For the sake of comparing trends in median household income and affordability of homes, the author normalises two very different quantities and presents them on the same y-scale. In an attempt to make a point about trends in opposite directions, the author forces a visual perspective on the user for two correlated but independent quantities.
The graph is not easily comprehensible for comparable data points. For example, it is hard to say which of the two cities, Bethesda or Washington, DC has higher % of homes accessible to the middle class. Underemphasis on labelling data points leads to comprehension issues with the graph.
While the aim of the author is to present trends, the data is essentially discrete. The author presents discrete data in a continuous graph format and makes no attempt to visualise the discreteness of the data. Both the median household income and the percentage of affordability are discrete data points corresponding to each city and the author has created a continuous line graph without explicitly marking the data points.
The author has ordered the top 25 cities in order of median income. What is unclear is if the cities are also ordered by their richness.

How have I Improved the graph?https://public.tableau.com/views/medianincome_affordability/Sheet1?:embed=y&:display_count=yes

Implementation of a dual axis graph: When multiple quantities are being compared on the same graph, especially when the quantities are on different scales, the best approach is to plot them on a dual axis chart. The modified graph presents on the left side of the y-scale, median household income and on the right side of the y-scale, affordability for the middle class. This way, both quantities are represented in their own units and a visual perspective is not forced onto the user by modifying their values to fit a scale.
Explicit Labelling: Labelling of data points is important when visualising discrete data. It enables the reader to perceive differences in data points especially data points that have small differences. In the modified graph, the user can now easily tell that the city of Bethesda is 1% less in affordable homes for the middle class when compared to Washington DC.
Discretization of data: The author wants to present a trend using discrete data. How can we present the trend and at the same time not lose the discrete quality of data in the visualisations? We do this by presenting one discrete quantity in the standard format for discrete data, i.e. bar chart and the other discrete quantity in the author’s continuous line format, but we explicitly add markers for the discrete data points for clarity. This contributes to us being able to observe the intended trends while still being able to visualise the discrete data.
Color Coding: The bar chart and the line chart are color coded with complementing colors. Color coding visually brings out the difference in scales and trends of the two axes which makes it easy for the user to interpret the graph.

References:http://www.governing.com/gov-data/economy-finance/housing-affordability-by-city-income-rental-costs.html

https://www.usatoday.com/story/money/business/2014/05/13/housing-affordability-worsens/9034185/

Western Movies with Bewildering Plots

The Western is a movie genre which tells stories set primarily in the later half of the 19th century in the American Old West, often centring on the life of a nomadic cowboy or gunfighter armed with a revolver and a rifle who rides a horse(Citation from Wikipedia). An article in The Hollywood Reporter on February 28, 2017 (http://www.hollywoodreporter.com/heat-vision/shadow-superheroes-westerns-are-quietly-popular-971841) discusses the resilience of the Western genre across six decades starting from the 60’s to current day. In the article, the author publishes a plot of the year-by-year count of the number of American produced Western films with data drawn from Box Office Mojo(shown below).

This stylised stacked bar plot is hard to comprehend from direct inspection and requires additional effort in understanding what the plot is trying to convey. The ways in which this plot is confounding are,

A stacked bar plot is used when the total in each category and their composition are relevant. It is great for visual aggregation of each category. In the above plot, however, all the stacked bars visually aggregate to the same total but are numerically different. In addition, each bar represents a particular year in each decade( the first bar represents year zero in all decades, the second bar represents year one etc.) which is not the information relevant to the article.
The labels at the top of the plot appear to indicate the starting point of each decade but only hold true for the first bar. There are bars associated with a particular label that begin even before the labelling threshold.
There is no effective display of information. It takes any user a little extra effort from their side to interpret the information being presented. Users expect a quick shot of the visualisation.
The colour palette used in the stacked bar is a collection of small variants of one colour which makes it difficult to distinguish the composition of each bar.
The time dimensions in the stacked bar graph has years of different resolutions changing in different dimensions, that is, years are increasing in single units vertically and in decades horizontally. Having one measurement unit increase in multiple dimensions at different resolutions only adds to the confusion.

Re-creating the graph:

Elimination of stacked bars: Grouped bars are preferred to stacked bars in this case because the aggregate information is not relevant to us. On the other hand, grouped bars allow us to compare data within a decade and across decades which is more useful.
Clear Labelling: The decades are represented with crisp differential colours which make it easy for the user to quickly observe data of the decade they are interested in. This information in the plot is represented in a slick while detailed manner, with the labels on the data points making it more accessible.
Time in one dimension: By grouping the bars, we are also ensuring that time as a measure stays in one dimension with changing resolutions(single years are represented as being parts of decades)

References:

http://www.hollywoodreporter.com/
http://1010wcsi.com/how-to-fix-each-of-the-7-mistakes-that-ruin-a-good-infographic/

Small Talk, Big Data

– Krithika N.S.

http://hiase.com/uber-no-boys-club-tech-companies/

Silicon Valley tech companies often occupy meaningful positions in annual “best companies to work for” lists. They’re known for their young workforce, inclusive and liberal work culture, and great pay. But, one aspect that even these tech companies struggle with is gender diversity. In recent years, a number of these tech companies have begun addressing this issue by quantifying their gender diversity data. Google released its data in 2014 shortly followed by other companies such as Facebook, Twitter, LinkedIn, Apple etc. The most noticeable aspect of this data across all of these companies is that women are significantly under-represented in engineering and leadership roles.

Uber went through a lot of turmoil several weeks ago when a female ex-employee wrote an explosive inside story about how women were treated in the company. This led to a series of efforts from Uber to correct their management of internal issues related to gender.This also led to Uber releasing a detailed report about the gender breakdown and the racial make-up of the company. The above article goes into Uber’s gender breakdown report and discusses one aspect of it. The chart in the article illustrates the percentage of female employees in major tech firms. It presents Uber as having a slightly greater percentage of women employees than other valley firms but, broadly matches their trends.

From analyzing the chart, the following come to one’s immediate attention.

The chart is represented as a 3D bar chart. While a 3D illustration may give the chart more sparkle, it often gives a distorted view of the data. It wasn’t necessary to do it here as the dimensions of the data did not require an extra axis. A simple 2D chart would have done the job for the reader.
Communicating what the chart represents is a key aspect of data visualization. The labels in the chart occupy a significant part of this. The chart under consideration has three labels, namely, “Series 1”, “Series 2” and “Series 3”. Very little effort is made to explain what those labels mean. While this may make sense to the more data-driven audience, often times, readers are outside of this demographic and this labeling appears confusing to an outsider.
The chart in the article does not appear to take any chronology or timeline into consideration. Absolute time adds a lot of meaningful information to visualized data and the reader is deprived of this information.

Doing It Right

While researching data sets related to gender diversity, I stumbled upon another very elaborate, thorough, and functional graph that is presented in the link below. Doing the above analysis correctly would correlate with the below graph in a lot of ways.

http://www.informationisbeautiful.net/visualizations/diversity-in-tech/

Some of the ways in which the data has been interpreted and presented better are,

The percentage of men and women in the workforce has been illustrated in a stacked bar which appears easier to understand for comparison.
The graph appears easy on the eyes making the important data simple to decode.
The demographic information has been clearly labeled by year which does not require additional effort from the reader to decipher.

References:

https://pxlnv.com/blog/diversity-of-tech-companies-by-the-numbers-2016/

https://www.gooddata.com/blog/5-data-visualization-best-practices