Blog 2: Immigration Truths

Immigration was perhaps the most complex, debated, and controversial topic of the 2016 United States Presidential Election. In fact, “over 60% of registered voters reported that immigration was an important factor on how they voted” (https://ballotpedia.org/2016_presidential_candidates_on_immigration). Donald Trump, in particular, used the topic as a center piece for his presidential campaign and took a drastic stand on the issue. Ultimately, Trump set forth on a plan to cut down the number of immigrants allowed into the US, particularly from Latin America, and aims to do so by building a wall across the southern border of the US.

In his arguments, Trump continuously stated he would fix the “lax regulations” currently implemented under the Obama administration, and reverse the “sky-rocketing” number of illegal immigrants coming to the US from Mexico. Trump based his viewpoint on popular belief and negative connotations rather than real data and scientific facts, as I will discuss below.

When researching immigration during the 2016 election, I came across a very interesting and useful article that essentially disproved Donald Trump’s arguments on immigration. The article includes two very powerful graphs which relay clear and concise conclusions on actual immigration numbers in the US.

Admittedly, at first glance this graph does highlight the spike in immigration numbers from Mexico to the US (although during the 90’s and not Obama’s administration). This graph is what most Americans saw during immigration debates and what Trump used in his arguments on how numbers are sky-rocketing.

One could simply argue Trump with only this graph, because it is clear that even though there was a jump in immigration, the numbers have already decreased over 3 –fold and continue to decline. However, a better opposition to Trump is the following graph that the article made which adjusts immigration rates by percentage of population rather than simply raw numbers.

The graph above gives viewers a more accurate impact of immigration numbers. By adjusting for population, the “sky-rocket” numbers are insignificant compared to other immigration waves we have had in the past. The number of immigrants today per population is only about .5%, which gives viewers a much different feel than raw numbers of 3,000,000. By showing both visualizations, the author has created a simple, yet conclusive analysis of the real immigration situation in the US. The wave already smoothed out by the year 2010, therefore proving drastic measures which Trump is proposing are completely unnecessary.

I chose this source for the Blog post because I found these graphs to be very successful in their presentation. It is amazing how simply changing the metric from sum to percentage the results can change so drastically. In addition, these graphs convey results that contradict the most powerful people in our country and half our population. It is so easy to fall prey to misconceptions of data when the topic is so controversial.

http://metrocosm.com/animated-immigration-map/

 

Uber and alcohol related crashes

Introduction

The above chart was featured on the Economist early this month. The above talks about the impact of Uber on the number of alcohol related crashes in New York City. The chart claims that it shows these numbers in contrast to other counties.

 

Some of the key takeaways from the above charts are that alcohol related crashes have reduced since the time Uber was introduced(indicated by the red line in the time line). The graph does a fine job of showing the drop in crash rates in all the counties except Staten Island.

However, the representation does not do full justice to the point that the author wants to convey. Some key questions that I would consider before creating a visual representation like this would be –

  1. What is the key point I am trying to convey?

The author wants to convey that something led to something. So, one of the key ways to prove this point in this case would be to show the negative correlation between the two parameters. There is no mention of an increase in Uber adoption leading to a drop in accidents from the time Uber was introduced. The other problem with the visualization is that it talks about the number of accidents and not specifically about the accidents related to drunken driving.

2. Is it possible that if I sliced this data across a different time duration, I might be able to prove otherwise?

While the drop in accidents is certain and definitive, there is also a visible hockey-stick like trend visible after 2012 in Brooklyn and Queens.So if I was to prove that the authors claim is wrong , all I will have to do is zoom in on 2012-2013 and show the increasing trend.

3. Why break by counties when you are talking about NYC as a whole?

The fact that the author has diced the geography by county creates a question about the consistency of this trend at the overall level. When rolled up at the overall level, it might be possible that this trend is not quite accurate.

4. Why 3 month moving average?

The metric of choice for representation in the graph above is the 3 month moving average. As we know, moving averages smoothen out any spikes in the trend. However despite the fact that it smoothens out values, there are spikes that are visible indicating high variance. So rather than visualizing the moving average, the author might have been able to make a strong case by simply visualizing the absolute number of accidents every year.

 

What could the author have done better?

To begin with, the author could have defined the metric more specifically around instances of alcohol induced  accidents rather than just simply accidents.   In addition to that, showing the negative correlation in Uber adoption versus the number of alcohol related accidents for starters(the scatter plot creates a stronger impression when we talk about correlated events despite the fact that correlation does not imply causation) would have gone a great deal further in explaining the point the author is trying to make . He could have also swapped the metric of choice-3 month moving average of number of crashes with the absolute number of crashes caused by drunken driving rolled up at the year level. Had he added these elements, I am sure he would have gone  a great deal further in convincing people about the claim he/she is trying to make.

Email Security

Cyber security is the buzzword today. Institutions are getting more and more cautious about how should they secure their applications. The above dashboard comes from a company which provides malware protection.

A dashboard according to me is supposed to convey critical information that’s important for the intended audience. There are few things that are good about this visualization and few things can be done better. Following is my take on this visualization.

Things I like about this:

  • One of the graphs above shows an array of threat vectors which gives good first level view of the kind of threats they saw on the accessed network.
  • The source countries are depicted from high to low flow in the form of a bubble chart with country’s flag which quickly helps identify these countries and the scale of the attacks coming in.
  • Both the above factors (Threat Vectors in first case and attack flow from different countries in second case) show comparisons between different elements.

There are few things which could be done better in this visualization

  • A good dashboard should demonstrate a story, by combining and linking different data elements. In this case, this dashboard just gives out lot of information and the reader on its own has to make interpretation of the data.
  • Information that 58 Un-reviewed, 9- Discovered and 25 Quarantined gives the first level information and then the user would expect more details on total number of threats detected/Total events seen and the associated breakup. But the following graph just mentions 220 threats in last 7 days and the graph associated  does not intuitively give out any information or breakup of that initial level information. If this is done, it would link the two elements as Total Threats Vs breakup on threats detected on each day.
  • The next two graphs on severity and threat type depict incomplete information. The threat type graph just gives types of threats and depicts ‘no numbers’ for each type Vs the total number of threats detected to get an overall picture. The graph on severity gives out severity numbers but the components on X axis for which this severity is depicted are completely unknown. Additionally, as there is no known benchmark to compare these values against, this graph doesn’t help take any actions.
  • Overall, this dashboard lacks a drill down of information and more explanation on each of the element mentioned.

With current information available, a better way to demonstrate this visualization could be,

https://drive.google.com/open?id=0Bzau8FgD0T1AVHRSalNXY0x2V2M

Threats severity graph with details on severity numbers and names of components against which severity is marked would give detailed insight to the audience.

Overall the above dashboard links the elements better, compared to the original dashboard.

Note: ‘Threat Type’ numbers and ‘Total Threat’ break up numbers are dummy numbers assumed just to demonstrate in the visualization above.

Reference: https://blog.threattrack.com/cso/wp-content/uploads/2014/03/ThreatSecure-Dashboard-Threat-Landscape.jpg

MVP Debate

 

If you are interested in who is going to make teammates shooting better in the regular season, this picture may give you some clue. X-axis shows the usage rate and Y-axis shows the true shooting percentage. Evidently, Curry does much better than other key players.

 

This picture also shows Curry, James and Westbrook have big influence on their teams, while Harden and Leonard does not. After you reading those two pictures, do you really want to change your mind for voting Curry but not Harden?

Data is not deceivable, but the man who made it is deceivable, because he must have the “Goal” or “Point” before starting drawing the picture, the title of the article is “The case for Curry the MVP”, also you can find another article called “The case for Harden the MVP”. And so on so for. If you really want to make a “fair” decision on voting the MVP, you may need couple of days or weeks to research huge amount of data, analyze them from varieties of angles and make conclusion. And even if you can do this, do you really hope every fan would do the same things as you? Impossible. In fact, Fans get used to easily understood data such as raw data (points, rebounds, assists) to pick up MVP, that’s the domain they agreed (At least most of them). Althrough these reports are really good, telling stories from different angles, they are not convincible to different groups of audience. I like those reports and you can say I am the audience of the article, but not all of us. That’s what I want to say,  certain analysis report only suitable to its certain audience.

Source: https://fivethirtyeight.com/features/the-case-for-stephen-curry-mvp/

The Case For James Harden, MVP

 

 

 

What happens in an Internet Minute?

I still remember my childhood days that were full of joyful moments coming without any smart phones or similar technical appliances. Yet, it is hard to believe how soon technology has captivated the world over. Not a single moment passes by when we aren’t hooked on to our iPod or lazing around with an X-box or even chatting over the phone.

This visualization depicts just how expeditiously social media has surrounded everything we know about. Data shown here only goes on to prove the basic claim of how majority of our time is spent over the wire. This visualization shows all popular social media platforms and their impact on us. One can easily feel the current numbers to have increased at a rapid pace and even though they increase day by day yet it is impossible for anyone to foresee or predict such numbers for the future.

What I really like in this visualization is how quickly it gives you a glimpse of 16 popular applications and how vast they have become today. Most of us have been using such apps for quite some time now while forgetting just how much everyone around us are equally hooked onto them as well. Hence, talking about such huge numbers enlightens us to see the impact internet in general has over us.

Despite getting the message loud and clear it still lacks a few fundamentals. Firstly, there should be additional numbers to show the percent increase in users from last year that goes on to help us understand the pace of change. Moreover, it could also have shown the country where each app is most popular in, giving us an insight as to how different apps are favored by different cultures and what may become a trend for the future. These numbers are certainly the average usage of each app computed in a minute however, it could have been more useful had they differentiated it with respect to different times of the day just to follow what time suits most people the best. Also, real-time updating of these numbers, even though it is quite a tough task to achieve, would have given a new depth to this dashboard.

To conclude, we can see how visualization concepts and tools can be used to depict anything in a powerful manner. Users should not only be able to make sense of the message being delivered but also use different measures to extract meaning out of each dashboard.

http://www.onlinecollegecourses.com/2009/12/07/50-excellent-scholarly-literary-criticism-blogs/

Christmas Chaos

This infographic is designed to convey information about shopping and gifting trends over the holiday seasons in the years 2004 to 2015.

http://visual.ly/christmas-and-new-year-holidays-around-world

The entire infographic is full of badly executed visualizations, but I am going to pick one bar chart in particular to comment on – “Per Person Holiday Spending” which shows the average per customer spend over the years 2004 to 2015.

I liked a few things about this chart –

  • The title is bold and clear, the user understands immediately what the chart is about. To that end, the designer has made all the titles bold to make them pop out and be effective.
  • The x and y axes units are uniformly scaled with 1 year increments and 20$ increments respectively.

Things I didn’t like about the chart –

  • COLORS – The colors are neither uniformly used nor visually pleasing. The designer has used 4 different colors in no particular sequence. On first glance, you think that a certain color (red for instance) means something in terms of year or amount being spent when it has no meaning at all. The colors have been randomly placed for each year.
  • 3D BARS – 3D representation of this bar chart was unnecessary. If you look closely, the top of the bars are not uniform – the viewer’s perspective is skewed. Although this does not affect any information being communicated, it is an eye sore and adds to the overall visual chaos that this chart is.

Visualizing “Disasters” through the lens of interactive dashboards!

Mishita Agarwal

Dashboard: https://www.fema.gov/data-visualization-summary-disaster-declarations-and-grants

Introduction
This interactive dashboard presents visualizations of federally declared disasters in the United States since 1953. It also visualizes the disaster assistance and preparedness grants from Federal Emergency Management Agency (FEMA) released in these disasters since 2005. All the information is provided for national as well as state level.

What is most appealing….
Interactive dashboards are the best way to showcase visualizations when huge amount of data is to be shown across wide span of regions.
The most appealing feature of this interactive dashboard is its user-friendly interface. It provides the right amount of information at every step without overcrowding the page by letting viewer click for additional more granular information. It enables viewer to filter the information based on categories and sub-categories by just one click.

The highly interactive nature of this dashboard makes very easy to dig deep inside the data and to observe pattern of disasters in various states, which otherwise can become very complicated to observe in static dashboards. Overall, this feature can simplify search to a great extent and can make a user experience very pleasant.

Geographical map given on the top is a very good way of showcasing states. One could select a state and visualize the disaster patterns. It also makes the visualization of disaster easy across regions such as I could find the disaster patterns in the Eastern most state or Northern most state, which otherwise could become difficult if I were to select a state from a drop-down list.
Bar-graph is an effective tool to visualize the disaster type and the frequency of each disaster in a selected state. It makes visualization furthermore easier by providing information of declared disasters across counties in a state.

As we select any state, the line-graph chart of declared disasters across years gets updated automatically. Line-graph is a very effective medium to analyze the temporal pattern of disasters.
Finally, when the viewer scrolls down, the information of grants provided in the disaster categories is shown by the bar-graph through which one can easily analyze the amount spent in every category such as fire incidents, preparedness, etc.

But still there is a lot of space for improvement……
I think the data provided in the excel sheets does not match the visualization numbers. For example, as per the excel sheets 2820 Fire incidents were reported nationally, however, visualization shows only 989 Fire incidents. Similar differences are there for other categories also.

Next, as states acronyms are not provided on the map, a person who is not aware of USA map has to move the cursor around the map to find a desired region. Supplementing map with state acronyms would make the filtering process easier and quicker.

Also, I found that though each category of disaster was divided into subcategories, which helps us to know more in detail about the type of a disaster, the description given of that sub-category given is very complex, and it does not give us any information about the actual reason of that disaster. For example, the disaster “Fire” is further segmented into “Angel Fire”, “Eighty-two fire”, etc. It seems that all these terms are tied with some special characteristics like place of an incidence, type of resource etc. But the visualization does not give a clear picture of reasons behind a disaster; such as Fire incidents can be “household-fire”, “human-caused fire”, etc. Tying incidents with their cause can be very useful for preparing to work on the root cause of the incidents and prevent their occurrence in future.

The year wise map becomes very cluttered when we keep on clicking on “+” sign to go to a more granular level such as quarters and months. So, there should be a separate bar graph to show the month-wise pattern of disasters. The benefit of this would be that it would help FEMA to identify if there is any relation between any type of disaster and particular month. For example, the very high frequency of non-severe storms in any month would alert the agencies to take immediate measures to prevent any major disaster to take place.

I also feel that the visualization could have been better if colors were used to depict certain properties. In general colorful visualizations tend to be more attractive and effective than black-n-white ones. For example, the five types of grants could be shown with different colors on the bar graph that plots yearly total grants with each bar containing five color regions corresponding to the five grants and the length of each region proportional to the amount of the grant. That would help a viewer to compare all the five grants with each other for a particular year on a single plot.

Though the disasters are given from 1953, the grant data is available only since 2005. So, does it means that FEMA, which was established around 34 years ago, started providing grants only from last twelve years. This is important to know because there could have been several states which were most vulnerable to disasters but did not receive grant in any segment.

Conclusion:
Overall this interactive dashboard is very useful and provides a complete story of disasters and the grants. The icons used at the end are effectively conveying the message of supporting the community’s emergency management efforts. However, the weaknesses should be addressed to make it more effective as this dashboard is from a government website, and lot many people refers to visualizations posted on their websites.

“A disaster is a natural or man-made event that negatively affects life, property, livelihood or industry often resulting in permanent changes to human societies, ecosystems and environment.”

Screenshots to visualize some errors and plot of additional visualizations:

https://docs.google.com/a/scu.edu/document/d/1OoptzYIJp6VE1zDJtxiVI_TDjI6pBPl-j5Y3-UZKRNw/edit?usp=sharing

 

NFL Ratings Graph

This graphic is a bar chart designed to show the NFL viewership trends over the past four seasons. This chart shows the combined number of viewers of the first eight weeks of the 2013-2016 seasons of the four major networks (Fox, CBS, ESPN, and NBC) for three different age demographics: 18-34, 18-49, and 25-54 year old adults. The point of the trend is to illustrate a few general, big picture points. First, that NFL viewership really has been going down as of late. Second, this trend is actually not a new trend, and has been going on for at least three seasons (which means that certain controversies that occurred right before the 2016 season are not solely to blame for the drop in viewership.) The graph also shows that the NFL appears to be struggling with the age demographic considered to be the most valuable (18-34 year olds). The graph is also designed to lead into several more detailed graphs that are placed later in the article. The graph on it’s own is not trying to give arguments as to why less people are watching the NFL, and is designed to supplement the main article, which does provide some possibilities.

Before I go on, let me address some immediate questions that might come up while looking at this visualization. This graph is only focused the viewership by age groups, and does not break things down by other factors, such as gender and race. However, this is because this and a separate article do feature bar graphs that focus on these factors. However, I will only be focusing on the first bar graph.

One thing that the graph does well is that it remembers to start the y-axis scale at 0. From what I understand, one of the main mistakes that bar graphs do is to start the scale at something other than 0, which can make things look different than they actually are. For example, if the graph had started at 3000 viewers, then the 18-34 viewership bars would be super short, which would give the impression that this demographic is not important (even though it is)

One of the things that this graph does well is that it has visual clarity. I like that the viewership numbers are not all on top or inside each bar. Instead, the graph stacks the numbers, which prevents the numbers from crowding one another out, and keeps things clear. By having the numbers, it also makes it much more clear that the numbers are actually declining. For example, if the red and green bar in the 18-49 portion did not have numbers, one might think that viewership did not change. I also think that the graph, for what it’s trying to do, does convey it’s information well enough. It makes it clear that for every age demographic, less people have been watching in each consecutive season (although only using the first eight weeks, where the games might not be considered as important, could make the graph not as accurate as it wants to be).

The most obvious criticism of the graph is that there is an overlap between the first and second demographics, and an overlap between the second and third demographic. This adds some confusion to the graph. If the graph is trying to compare the drop in viewers by age group, then this graph is not clear. In addition, as mentioned before, this graph does not do a good job at letting the audience draw any conclusions as to why viewership is down.

If it wanted show this, it would make sure the age groups were separate, without an overlap. I also question why the visualization is not a line graph, as the point is to track a trend over four years. The line graph could have three different color lines for each age demographic (no overlap), and the weeks of each season on the x-axis. I would also try to compare the viewership totals for other sports, so that there is some comparison point for the NFL. If NFL viewership is still much higher than that of it’s competitors, than the drop in viewership might not be as much of a problem


 

The NFL’s ratings are down – but just who exactly isn’t watching anymore?

Blog 1: “Tips & Tricks to Avoid the Crowd”: Data for Introverts

https://salankia.shinyapps.io/satRday2016DataViz/

This Dashboard was a part of a visualization challenge called SatRdays, a part of a regional Europoean Conference (http://satrdays.org/).

As an introvert, there may be several factors that influence decisions you make when you travel.

This dashboard has a couple of different visualizations to help the traveling introvert guarantee that he or she will have the least amount of crowd interaction. The designer has highlighted 3 main areas where one may encounter crowds when traveling through airports (direct flights from Belgium):

  • traffic to connecting airports
  • utilization of planes
  • seasonal impact to travel

This data could also could be useful to anyone interested in optimizing travel time.

———————–

(entire viz cannot be seen in screenshot, visit: https://salankia.shinyapps.io/satRday2016DataViz/ for the complete view)

Images didn’t insert into the blog, use link below to view:

https://salankia.shinyapps.io/satRday2016DataViz/

———————–

This dashboard presents a lot of data. As a user, I can pick an airport that is less crowded, cross check my selection with plane utilization, and then see how  seasonality would further impact my trip. The data is separated on different tabs and filters can be made independently which allows for independent analysis. The user is presented a lot of good information that could help influence decisions, such as average number of people on planes leaving Belgium as well as trends to destination cities from Belgium.

——————————

It is up to the user to keep track of all selections. A way to improve this would be if filter values (or user choices) were global—if one selection is made, it changes everywhere.

For example if I made a selection on the connecting airport tab then it should show me corresponding utilization.

The data could be shown all on one tab, displaying all data at once providing the user with all the information at once in one place. The user wouldn’t have to keep going between tabs.

The designer also could also be more consistent with color, it is hard to tell just by looking – it looks like it is based on value, but assigning color to specific countries might make values easier to spot even if the layout of the data is different.

—————

Taking this a step further looking at traffic at origin airport (Belgium) , terminals, airports and additional factors could be examined for crowd factors.

Just save the pies for desserts

The charts are there to help us to understand more about the data. But it’s so easy to design a bad visualization. In general, the point of charts is to make it easier to compare different sets of data. The more information a chart is able to convey without increasing complexity, the better.

The primary strength of a pie chart is the part-to-whole relationship, however, pie charts only make it easy to judge the magnitude of a slice when it is close to 0%, 25%, 50%, 75%, or 100%. Pie charts visual attributes is hard to compare.

Here’s a pie chart of the party breakdown of the European parliament:

 

 

Can we really compare the slices to figure out the distinctions in size between each and every pie slice? The only thing that is obvious to us is that the EPP and S&D are bigger than any other pieces. And the color of each individual slices are very similar, make it very hard for users to match the label to each slice.

 

Moreover, people love dressing up their pie charts today. Adding a third dimension of depth to the picture, throwing in some lighting effects and contoured edges. It’s pretty and eye-catching, but is it more meaningful or easier to interpret? Actually, by adding depth to the pie and changing its angle, we’ve made it more difficult to interpret. People do this all the time, and that’s because an angled 3D pie chart is an excellent way to lie to you.

 

Looking at this chart, S&D — the red party — appears to be roughly even with EPP, the teal party. It looks greater than it actually is, because of the depth that’s been added. The slices are now more difficult to compare, because the angle skews their appearance.

If we take out each individual slices, will that make it easier to compare each individual slice and figure out an ordering from largest to smallest? The reality is, humans aren’t very good at comparing slices of a circle when it comes to size.

Dashboard is to present information in a way that can be quickly read and easily understood. Bar charts makes it better to compare the magnitudes of each part.

Here is a bar charts of the same date. You can compare each and every party to each and every other party.  You’re just comparing the length of rectangles in order to understand what’s going on.

If a bar chart is doing its job, you shouldn’t have to struggle. Just save the pies for desserts.

Reference:

http://www.businessinsider.com/pie-charts-are-the-worst-2013-6

The Worst Chart In The World

https://www.perceptualedge.com/articles/visual…/save_the_pies_for_dessert.pdf

Save the Pies for Dessert