Things to know about Data Storytelling

Data analysis doesn’t only involve making visualizations and quantifying data. The data analyst should be able to tell  a story through all the analysis he/she has done. The story made should be credible and should convince the reader about the story tellers expertise.

  1. Story teller is the expert: While designing any visualization, the designer should be well aware of the data. Even if the visualizer is manipulating the data, he should be aware where to draw the line and keep the data relevant. There are people relying on the information the story teller is giving so the accuracy must be kept intact.
  2. Know your audience: It is important for the story teller to know his audience. The information that can draw the attention or keep them engaged should be present in the story. The information representation should be simple so that they don’t have to be experts on the subject being discussed.
  3. Story should have context: While creating a story the story teller may assume certain things and forget to give context for the analysis. The audience does not have the complete information regarding the subject and may not understand the context behind the story. While stating any co-relations the story teller must make sure to give information on why he felt like relating two values so that there is no doubt in the audiences mind.
  4. Design matters: The brain can understand visualizations faster than numbers hence the visualization design should be simple but effective. The audience should not be bombarded by information and the dashboard should have only similar pieces of information. Comparative charts are useful while story telling as it helps the audience understand the meaning behind it faster.
  5. Use visualization Strategically: The order of visualizations in a story telling is crucial. The order in which the visualization is presented determines the effect of the story. The sequence should be logical and make an impact on the viewer’s mind.

Source:http://blogs.sas.com/content/customeranalytics/2015/06/15/6-things-learned-data-storytelling/

How to create a simple Bar chart using D3.js? Part 2

In the previous post we saw what an svg element is how to define it, what the domain through the xAxis and yAxis and range through xRange and yRange are how to fit it according to the data set.

Next is to transform the xRange and the yRange into the plotting space and draw a line across the plotting space.

This function d3.svg.line() is used to draw a line graph for which we need to create a line generator function which returns the x and y coordinates from our data to plot the line. How is this line generator function defined?

var lineFunc = d3.svg.line()

.x(function(d) {

return xRange(d.x);

})

.y(function(d) {

return yRange(d.y);

})

.interpolate(‘linear’);

The .interpolate(‘linear’) function call tells d3 to draw straight lines. Next step is to set the ‘d’ attribute of the svg path(as mentioned in the xRange and yRange) to the coordinates returned from the line function.

vis.append(‘svg:path’)

.attr(‘d’, lineFunc(lineData))

.attr(‘stroke’, ‘blue’)

.attr(‘stroke-width’, 2)

.attr(‘fill’, ‘none’);

As we can see above the line color is set using stroke, line width set using stroke-width, fill is also set to none so that the graph boundaries are not filled.

Next step is to create the bar chart. As seen earlier the axes has already been created hence we just need to modify the exiting code a bit.

yAxis = d3.svg.axis()

.scale(yRange)

.tickSize(5)

.orient(“left”)

.tickSubdivide(true);

The yAxis starts at 5 because 5 is the minimum y value of the sample data. Hence yAxis needs to scaled from 0. Hence the yRange function needs to be modified like this:

yRange = d3.scale.linear().range([HEIGHT – MARGINS.top, MARGINS.bottom]).domain([0,

d3.max(barData, function(d) {

return d.y;

})]);

In case of bar charts ordinal scales help maintain a discrete domain hence we will be using ordinal scales instead of linear. rangeRoundBands helps in dividing the width across the chart bars. The spacing between the bars are set to be 0.1 can be altered according to user’s preference. Hence the final xRange is defined as:

xRange = d3.scale.ordinal().rangeRoundBands([MARGINS.left, WIDTH – MARGINS.right], 0.1).domain(barData.map(function(d) {

return d.x;

}));

Next step is to create rectangular bars for the chart data. Our sample data will be bound to the rectangles using x and y coordinates to set the height and width of the rectangular bars.

vis.selectAll(‘rect’)

.data(barData)

.enter()

.append(‘rect’)

.attr(‘x’, function(d) {

return xRange(d.x);                    // sets the x position of the bar

})

.attr(‘y’, function(d) {

return yRange(d.y);                    // sets the y position of the bar

})

.attr(‘width’, xRange.rangeBand())                         // sets the width of bar

.attr(‘height’, function(d) {

return ((HEIGHT – MARGINS.bottom) – yRange(d.y));                    // sets the height of bar

})

.attr(‘fill’, ‘grey’);                           // fills the bar with grey color

Final step is to set the CSS properties of few of the elements.

.axis path, .axis line

{

fill: none;

stroke: #777;

shape-rendering: crispEdges;         // provides hint on what tradeoffs to make as the browser renders the path element or basic shapes

}

.axis text

{

font-family: ‘Arial’;

font-size: 13px;

}

.tick

{

stroke-dasharray: 1, 2;                     //this property helps css in creating    dashes in the stroke of the Svg shapes

}

.bar

{

fill: FireBrick;

}

Finally our bar chart from the sample data is ready, with all the styling elements. A detailed version of the code base is available here:

http://codepen.io/jay3dec/pen/fjyuE

Hope this post encourages you all to try a simple bar chart on D3.js.

Reference : https://www.sitepoint.com/creating-simple-line-bar-charts-using-d3-js/

 

Pie Charts and Interactivity?

For this blog, I want to go back to the first class and talk about one of the first things we learnt – Pie Charts are a big No No for visualization.

The visualization I picked talks about countries from where the highest immigrant populations are coming into US from a time period starting from 1960 to 2015. While the data that is analyzed is huge and shows diverse and a changing trend over time, using a pie chart to represent it has made it less intuitive and informative. What makes it worse is using interactivity with pie charts to show the data change over time.

The interactive visualization, by using the slider underneath it, we can see how the immigrant population is changing or is for a particular year. But,

Firstly, This visualization violates the very basic – Eye beats memory rule of thumb. When I use the slider to move from one year to another, it is very difficult to remember and compare how the trends have changed for different countries.

Secondly, what makes it even more difficult is that since no actual numbers are mentioned for countries; the areas for some countries as displayed by the area on the pie chart look very close to each other and  it is confusing to compare.

Thirdly, I do not understand  how to order them. While I can make out what is perhaps the largest and the second largest values, rest everything is vague. Also, changing countries over time doesn’t help.

Fourthly, while they have tried to keep the use of colors in a consistent fashion, since immigration from countries is changing and some countries are getting added or removed over time, the color change is puzzling.

What I would do to change it?

I want to go back Hans Rosling’s version to display the data to show the countries that have the largest immigrant populations coming into US and how it has changed over different years. The interactive version will make more sense and by visualizing the data on one chart, it becomes easier to compare and analyze.

Or Keep it Simple – Since the MPI wants to show numbers over a long period of time across several countries – I will use a table. It is simple, straightforward and will provide for an easy comparison.

Source – http://www.migrationpolicy.org/programs/data-hub/charts/largest-immigrant-groups-over-time

Word Cloud vs Packed Bubbles: A Real Practice

In my redesign project, I collected data regarding the top 100 accounts on Instagram with the most followers. And then I grouped those accounts into different categories based on the account holders’ occupation. Turns out that most account holders are singers and soccer players. This is for revealing what do young people like on Instagram.
At first, I used word cloud to display the result. As you can see from the image link, word cloud would put the word with most records at the center, and word with more records looks larger. However, word with longer length, for example “Celebrity”, looks in similar size with “Soccer”, although celebrities have less records than soccer players. The reason for this confusion comes from words of different lengths that mislead the audience.
Later I changed this word cloud into packed bubbles, which used different sizes of circles to demonstrate the number of records of categories. With circles wrapping the word, we would not be as confused as before. Thus in my opinion, word cloud looks fancy, but is less useful to create a good visualization.
You can also check out the article here that detailed discussed pros and cons of word cloud for more information.
Thank you for this amazing quarter and thank you all!

After effects of redesign and deception.

Over the weeks, we have understood the ways to approach data, what to use, when to use and how to use charts to highlight your process, data and visualizations. So, this would be the last-run over the basics of what I learnt from this course, what I understand and how we could approach data and in result get the best visualizations to address it.

There are at least three key concepts we need to understand when starting a data project:

  • Data acquiring should begin with a list of questions you want to answer.

This is the part where we decide on the audience, the claim and the warrants that could prove your claim. Then it is advised to gather all the variables and records, rather than the subset that could answer the questions for the immediate story.

  • Data often is messy and needs to be cleaned.

Usually when we acquire data from a source we cannot confirm all the data is clean so there might a lot of time spent on cleaning the data. Also in cases of having multiple data sources, we could encounter fields that need to be joined have a mix of values, are misspelt or have variations that could get the standard down.

  • Data may have undocumented features

When we start to understand the data the first thing required is the data dictionary. There could be fields in the data that are addressed in different ways than the normal convention. For example, with gender addressed as M=1 and F=0. Also, there might be cases when a couple of fields in the data is not in the dictionary making it harder to understand it.

After the time spent on research and analysis of data to get the visualizations of the individual projects, this was the lesson I have learnt and will bear for all the future projects related to data.

 

http://datajournalismhandbook.org/1.0/en/understanding_data_2.html

 

Visualization redesign: Rules of engagement

Design is not a science. But “not a science” isn’t the same as “completely subjective”. In fact, the critique process has brought discipline to design for centuries. For visualizations which are based on an underlying shared data set, there’s an opportunity for an additional level of rigor: to demonstrate the value of a critique through a redesign based on the same data.

Criticism through redesign may be one of the most powerful tools we have for moving the field of visualization forward. At the same time, it’s not easy, and there are many pitfalls, intellectual, practical, and social. How can we use the tool of criticism to best advantage, with awareness and respect for all involved? Here are some suggestions, which fall into three categories: maintain rigor, respect for designers, and respect for critics.

1. Maintain rigor

As with a scientific experiment, it’s important to know the reason for a redesign — what is being “measured”, in a sense. There are many possible goals for a visualization. A critic who creates a redesign should be explicit about the goal — and the fact that they may be interested in a different goal than the designer.

Second, critics must be honest about any simplifying assumptions. If a redesign shows less data than the original, that should be mentioned up front. Otherwise, there’s a danger that any perceived simplicity of a redesign is really just the result of a reduction in data.

Part of maintaining rigor is acknowledging situations where professional judgments don’t agree, and finding ways to come to an understanding. The first step is to have a conversation about the source of the disagreement. Very often it turns out that different professionals have different criteria for success for a visualization, or have different goals in mind; clarifying these is extremely useful to the field.

2. Respect the designer

All redesigns have the potential to seem adversarial, as if the critic is pointing out flaws in the designer personally, asserting their own superior skills, or even, as assigning some blame for a disaster. But it isn’t a pleasant experience. Therefore making the process more friendly for the designer is a good idea.

3. Respect the critic

Criticism is hard, as hard as design. Indeed, in established media (books, movies, music) good critics are recognized as experts in their own right. As a field, we should give the same respect to our visualization critics.

A point for designers is to keep in mind the goal of the critique process: ultimately, none of this is a personal evaluation, but instead a way for the field as a whole to improve.

Conclusion

Data visualization is still a new field. It’s already become an essential medium for journalists, scientists, and anyone else who needs to understand data. But the medium is far from understood. It’s early still, and there’s a lot of room for improvement. Therefore criticism, and redesign is an essential part of visualization criticism.

Source: https://medium.com/@hint_fm/design-and-redesign-4ab77206cf9#.7l57fdh70

Interactive Marketing Dashboard

A marketing dashboard prominently displays the funnel and other related metrics to help marketing decision-makers allocate campaign spend. It’s important that each lead, web visit and win is assigned a tangible value in order to help marketers correctly gauge the success of key campaigns.

Good points about this dashboard.

  1. The dashboard conveys all the important KPIs for the audience in a very simplified manner.
  2. It has a summary on both the sides of the dashboard and a section at the top which means the user gets an overall view of what is happening in the data. These important KPIs are displayed with aid of appealing visualization.

Improvements:

  1. No single color legend across the dashboard.
  2. The ‘Conversion by Form’ visualization can be shown as a bar graph as the current graph is difficult to understand as what amount has been converted to for each page.
  3. Color legends are not defined anywhere on the dashboard. The user has to hover over the chart to understand what color belongs to channel market in case of market channel breakdown or page in case of landing page success rate.

Reference:  https://dashdemo.sisense.com/app/main#/dashboards/582a26af0ff585080700002c?r=false&h=false&t=false&l=false&volatile=true

 

 

Bar Charts – Keeping it Simple

One of the most basic and most used charts that everyone uses is the bar chart. While it is the most basic, it can also be considered as one of the plainest ones and the urge to spruce it up and make it more flash is sometimes too strong to resist. However strong that urge is, it is better for you to ignore it. By adding these additional things, you are running the risk of muddying the point of your visualization and losing the meaning. It also has the effect of taking the attention of the viewer away from the story of your visualization.

  1. Remove 3d effects, shadows, gradient color fills, and any other special flashy effects
  2. Order the Data
  3. Show who the author of the data is and cite it
  4. Label the Axis
  5. See what additional information can be used to support the data you already have

Reference:

https://www.prometheusresearch.com/information-design-five-ways-to-improve-your-bar-graph/

New York Times Does a Great Job on Data Visualizations and What We Can Learn From?

One of the secrets to why New York Times keeps its success is synthesized design processes. Below are the ten characteristics that they use to make its data visualization article powerful and successful.

  1. Clarity of context and purpose: New York Times establish the goal and clarify the audience’s need before design a visualization. It would ask:
    1. Does it interact with the audience and let them feel connected?
    2. Does it enhance a specific editorial perspective?
    3. Which format is the best to deliver the message?
  2. Respect for the reader: New York Times makes its subject accessible. It not only delivers clarity but also presents simplicity. Let the readers can get quick access. Moreover, New York Times would adjust immediacy based on the level of the subject. For example, it wants its audiences to put more effort and get rewarded with the insight derived as a result. It would increase complexity.

    NYT
    NYT
  1. Editorial Integration: combine the graph with the article. Let them coherent and support each other.
  2. Clarity of questions: the format of visualization effectively and perfectly aligned to the questions they are answering.
    NYT
    NYT

     

  3. Data research and preparation: it makes a lot of effort on cross departments research and development of programming libraries to get rich and deep data resources, and offer multi-dimension information.
  4. Visual restraint: deploy right color, catch reader’s attention, and let them recognize immediately.
    NYT
    NYT

     

  5. Layout and placement: whether it is full columns, double page spreads or dramatic diagonals, the Times ensures each graphic has the perfect stage to amplify the impact of the visual’s relationship with an article.
  6. Diversity of techniques: its interactive graph shows immense flexibility and versatility. No repeated representation and each piece is built attentively and informatively.

    NYT
    NYT
  7. Technical Execution: multiple formats of chart display.
  8. Annotation: well use of labels, description, and text explanation to help readers understand graph.
    NYT
    NYT

    Reference: http://www.scribblelive.com/blog/2012/04/02/10-things-you-can-learn-from-the-new-york-times-data-visualizations/

When subway/tube map meets data visualization

For the last blog posting I wanted to know how many types of visualization are out there and when to use them in different occasion.

By searching Internet I got to know that most of visualization fall into categories like 1D/Linear, 2D/ Planar, 3D/Volumetric, nD/Multidimensional, Tree/Hierarchical and network.

What got my attention most was subway/tube map under network category. At first glance I thought it was just for subway transportation but turned out it was for network visualization.

The subway/tube map connects different tasks into one network. In the MAP of All YAHOO! APIs and SERVICES we can see there are clear lines of tasks but each line networks with other lines of take. The network visualization clearly defines line of tasks with distinguished color. Readers can follow a particular line of task that they want to focus. Each “station” has it’s own name and unique symbols. Readers don’t need to read the words/ terms but can easily tell what’s the task/function on the particular spot. There’s also a clear light background grid organizes the tube map in a restricted lines, so it helps readers eye balance with their memory.

When subway/tube map meets data visualization it creates unique and strongly organized network visualization.

Reference

  1. http://lgimages.s3.amazonaws.com/data/imagemanager/62927/flickr_phploveme_2957594315.png
  2. http://guides.library.duke.edu/datavis/vis_types
  3. https://tfl.gov.uk/corporate/about-tfl/culture-and-heritage/art-and-design/harry-becks-tube-map