A toothpaste brand claims
their product will destroy more plaque than any product ever made. A politician tells you their plan
will create the most jobs. We're so used to hearing these
kinds of exaggerations in advertising and politics that we might not even bat an eye. But what about when the claim
is accompanied by a graph? Afterall, a graph isn't an opinion. It represents cold, hard numbers,
and who can argue with those? Yet, as it turns out, there are plenty
of ways graphs can mislead and outright manipulate. Here are some things to look out for. In this 1992 ad, Chevy claimed to make
the most reliable trucks in America using this graph. Not only does it show that 98% of all
Chevy trucks sold in the last ten years are still on the road, but it looks like they're twice
as dependable as Toyota trucks. That is, until you take a closer look
at the numbers on the left and see that the figure for Toyota
is about 96.5%. The scale only goes between 95 and 100%. If it went from 0 to 100,
it would look like this. This is one of the most common
ways graphs misrepresent data, by distorting the scale. Zooming in on a small portion
of the y-axis exaggerates a barely detectable difference
between the things being compared. And it's especially misleading
with bar graphs since we assume the difference
in the size of the bars is proportional to the values. But the scale can also be distorted
along the x-axis, usually in line graphs
showing something changing over time. This chart showing the rise
in American unemployment from 2008 to 2010 manipulates the x-axis in two ways. First of all, the scale is inconsistent, compressing the 15-month span
after March 2009 to look shorter than
the preceding six months. Using more consistent data points
gives a different picture with job losses tapering off
by the end of 2009. And if you wonder why
they were increasing in the first place, the timeline starts immediately after
the U.S.'s biggest financial collapse since the Great Depression. These techniques are known as
cherry picking. A time range can be carefully chosen
to exclude the impact of a major event right outside it. And picking specific data points
can hide important changes in between. Even when there's nothing wrong
with the graph itself, leaving out relevant data can give
a misleading impression. This chart of how many people watch
the Super Bowl each year makes it look like the event's
popularity is exploding. But it's not accounting
for population growth. The ratings have actually held steady because while the number
of football fans has increased, their share of overall viewership has not. Finally, a graph can't tell you much if you don't know the full significance
of what's being presented. Both of the following graphs
use the same ocean temperature data from the National Centers
for Environmental Information. So why do they seem to give
opposite impressions? The first graph plots the average
annual ocean temperature from 1880 to 2016, making the change look insignificant. But in fact, a rise of even
half a degree Celsius can cause massive ecological disruption. This is why the second graph, which show the average temperature
variation each year, is far more significant. When they're used well, graphs can
help us intuitively grasp complex data. But as visual software has enabled
more usage of graphs throughout all media, it's also made them easier to use
in a careless or dishonest way. So the next time you see a graph,
don't be swayed by the lines and curves. Look at the labels, the numbers, the scale, and the context, and ask what story the picture
is trying to tell.
Not a visualization but very relatable to the sub and might be informative to most. I understand that this is against the rules and most likely would be removed, I just posted in case the mods see it fit here
I also think that the amount of data being collected is growing rapidly. The data we're collecting is also getting more and more sophisticated and we are recording and measuring all sorts of things these days. Hardware to store the data is becoming a non-issue, software to access that data easily is becoming a non-issue, and software is visualize that data any way you want is all readily available (often for free!).
With this naturally comes growing demand for more comprehensive and sophisticated visualizations, which means occasionally breaking away from the cornerstone visualizations (e.g. bar charts, scatter plots, etc). Now we're seeing that visualizing data is getting more and more complicated and the gold standard is a moving target. This means the skills required to visualize data well is expanding quickly, too.
The reason I think that's relevant to this post is that I think we're hitting a point where there's so much data to analyze that the hardest question isn't what visualization to pick, but rather what information to visualize! I see a lot of visualizations these days that are misleading in the sense that they seem to not be comprehensive enough to give people the correct understanding of the "big picture". I think we're at the point where one chart probably won't cut it a lot of the time. We're at the point where web pages with many visualizations and explanations in between are likely what's needed to really do justice to showing off the data.
In short, you can show someone a beautiful chart with great labels and annotations and the whole nine yards. But if that chart directs them towards a conclusion that won't hold true if they saw some other pieces of data, then I think the person sharing that visualization also risks being misleading.
For what it's worth, I'm thankful this is posted.
While not adhering strictly to rules, it is relevant to nearly all content on this sub.
Public opinions based on misrepresented information is rampant currently, and critical worldviews should be fostered.
Thanks /u/DickTooCold.