Vis Lies 2015 Gallery

October 29, 2015

It was another great year of lying, cheating, and deception during this year's Vis Lies. Despite Georges, Bernice, and Ken all neglecting to make any promotional materials, word of mouth and some fliers hastily printed at the hotel brought a good and energetic crowd. Here is a quick recap of what we discussed.

An Introductory Warning

The FCC has just passed a law that any organization that produces and exposes a visualization distorting the truth, intentionally or not, will be fined according to its impact on a number of measures that the US Department of Homeland Security has recently published. — NPR October 2015

As we gathered in our meeting room to begin Vis Lies 2015, Georges Grinstein presented this startling fact on the projector. The report is a very troublesome declaration, even for those of us meeting to discuss, ultimately, how to avoid lying in visualization. After all, many vis lies are unintentional, and can we be held liable for misrepresentations? This sounds like a strong call to step up our efforts to remove distortions in all our visualizations.

So strong, in fact, that many questioned the authenticity of Georges' statement. When someone pointed out that the US FCC does not have the authority to pass laws, it became clear that we all needed to pay attention because we are all a bunch of dirty liars.

Optical Illusions

We all love optical illusions. Where else can you lie and get away from it? But still it is warranted to be sure that an optical illusion does not interfere with understanding of data. As Georges pointed out, simple combinations of lines and shapes can result in unexpected interpretations of the data. Consider for example Zöllner's illusion shown here. The vertical lines are parallel, but the simple addition of superimposed diagonal lines makes them appear to lean in different directions.

Think such an illusion can only happen in a controlled example of simple shapes? Think again. Optical illusions are constructed in many real-world objects both intentionally and unintentionally. A form of Zöllner's illusion is formed by staggering high-contrast tiles in one direction but not the other. The building shown here demonstrates this illusion. Although each floor is level, they appear slanted. This building in Melbourne, Australia was intentionally designed to appear slanted, but the illusion sometimes appears unexpectedly. This illusion is formally known as the Münsterberg illusion but is more commonly known as the Café Wall illusion as it was first noticed on the unintentional tiling of a café wall in Bristol, England.

Optical illusions are all fun and games until they creep into your visualization. Straight lines become bent and relative sizes become obscured. Here on the left side we see data represented as the relative areas of circles. Unfortunately, area is a great way to trick the visual system. It would be difficult to guess that the largest circle is over twice the area as any other circle, but as we see in the lengths of the bars at the right there is a very large difference between the largest and smallest values.

Selectively Picking a Winner

Georges brought up a conversation about selectively presenting data to skew the data to a particular conclusion. The analysis originally comes from Mushon Zer-Aviv in a 2014 blog on disinformation. The topic focuses on the public opinion on abortion. As this is a contentious issue, Gallup takes regular polls on the subject. Here is a summary of the results since the mid 1990's. As can be seen, for well over a decade public opinion has been evenly divided.

These results are pretty straightforward but maybe not all that exciting. So it is not all that surprising that different organizations might want to "spice up" the results. Take, for example, this infographic created by the AP. This is the same data, but the axes are rescaled to make the difference look much more dramatic. In 2009 you could observe a small shift in the measured attitude, but the chosen scaling of the axis makes it look dramatic. As we can see in the previous plot, the difference is insignificant enough to flip the following year.

Another example is provided by how a blog on the site presented the same data in yet a different way. This time, the data is chosen even more selectively. The plot here is showing data only for a subset of the participants (anyone older than 29 is discarded). The plot domain is also shortened to the 7 years that demonstrate the trend the authors want to show. The range is tightened to make the small changes look more dramatic. And viola! We have a trend exactly like the authors want.

If we simply expand the data shown to include all age ranges and extend the graph for another 2 years, we can see that the "trend" is far less dramatic. Once again, public opinion remains about even.

Overly Clever Information

Bernice Rogowitz provided several examples of infographics that tried so hard to be engaging they became misleading or just nonsensical.

First we have this nice infographic on the different type of employers in Idaho. It shows bands with heights proportional to the percentage of employees in that category. To reinforce that it's in Idaho, the bands are embedded in the shape of Idaho.

And this is where we run into troubles. The shape of Idaho is quite irregular, which makes the relative size of the bands look different. Consider, for example, the third band from the top labeled “Trade, Transportation and Utilities.” At 20.2%, this is the tallest band on the map. But the region does not look very big because it happens to be located in the skinny panhandle. Areas like “Education and Health Services” and “Leisure and Hospitality” appear to be as larger even though their respective numbers are much smaller. And the bottom band, “Government,” appears to dwarf all other areas even though it is actually (supposed to be) smaller than “Trade, Transportation and Utilities.”

Here is another fun infographic that uses photography. This poor man being bound in tape appears to represent public reaction to gang related crime. I assume that the hight of the bonds represents... whatever numbers this is supposed to mean. Seems rational, until you notice that the guy at the right who has 100% of his body covered represents 55%. Hmm. Maybe we are supposed to normalize the hight of the man to 55%. That matches reasonably well, but the shape of the man is so irregular it is actually rather hard to estimate these heights without a ruler.

This graphic purports to show the locations in the United Kingdom in most demand of multilingual employees. The big lie here is the size used for each region. The size represents neither the actual geographic size nor the demand for employees (the data the graphic is supposed to show). Looking at the graphic, it is unclear which locations are at the top and bottom of language skill demand, which is important since almost every location in the UK is listed here. Speaking of which, one of the ten locations is the UK itself. Huh?

Technically, Bernice didn't present this, but I found this infographic hidden in her slides and thought it was funny. The statistic it shows is pretty straightforward: 1 out of every 10 Americans owns an iPod. What it also teaches me is that the American iPod owners are giants and they sometimes accidentally squish Zune owners with their happy dances.

Distracting Hedgehogs

Next, Bernice brought us back to optical illusions, showing us this one that has been circulating the interwebs. What looks like a spiral in this image is actually 4 concentric non-touching circles. This illusion is very similar to the Fraser spiral illusion and the Pinna illusion. Like Zöllner's illusion described above, the effect is caused by regular patterns (the concentric circles) with misaligned parts (the black and white dot pattern composing the circles).

Can such an effect happen in visualization?

You bet. Consider for example the hedgehog technique to show the direction of a vector field. The hedgehogs are usually placed in regular patterns (such as the circles at the far left and the grid at the near left). The hedgehogs create a misaligned stimulus that can warp our perception of the patterns.

Low Color Discrimination

A common problem with color use that Bernice pointed out is the juxtaposition of two colors with too little contrast. Consider the image at far left. It is clear to distinguish the different colors of the fabric and the thread when they are separate and in large areas of the image. However, these two colors reflect about the same amount of total light, which makes them very hard to distinguish when sewn together (near left at the red arrow).

When two colors vary quickly such as with the thread sewn in the fabric, this creates a high-frequency color change. Thin features in visualizations such as lines exhibit the same high-frequency changes. Colors that do not change much in brightness are hard to discriminate. Thus, it is important to ensure that lines in a visualization have good contrast to the background. The fancy HUD GPS shown here has problems with color discrimination. This rather critical display is difficult to read when the drawn lines have a similar brightness to the road behind it.

Watercolor Effect

Bernice's final demonstration was on the watercolor effect where a curve comprising two closely spaced lines of differing colors can make a white region appear to be filled with a color. Once again we have an interesting optical illusion that change how we perceive a visualization.

One such place we can get this watercolor effect is in 2D contouring. The contours if place right (or wrong depending on your point of view) can change the interpretation of the background of the image, which may be intending to show some other piece of information.

Parallel Performance

Ken Moreland talked next, and he started with some recent observations he made with parallel program performance. To measure how well an algorithm behaves on a parallel computer, particularly on large parallel computers, the accepted approach is to measure the time it takes to run the algorithm using different numbers of processors. If a parallel algorithm scales well, it will run faster when using more processors.

Because parallel algorithm scalability is measured based on the time response of the algorithm, it is natural to plot the algorithms running time vs the number of processors used. An algorithm said to be scaling perfectly will have a running time inversely proportional to the number of processors (double the processors, half the running time), so the plot of a perfectly scaling algorithm will be a hyperbolic curve as shown at the left. Experts in parallel algorithms are aware of this and compare running times to this perfect hyperbolic curve when judging an algorithm's scalability. The problem with looking at hyperbolic curves is that pretty much every parallel algorithm time response, including those with very poor scaling, look almost identical to a hyperbolic curve.

Consider the curve on the left, which shows the time response of a parallel algorithm with a linear overhead with respect to number of processors. This is a very poorly performing algorithm, but it looks almost perfect when plotted. In contrast, when plotting the same data as the processing rate rather than the running time as shown at right, it is very clear at a glance that this algorithm scales poorly.

Although the previous three charts are of synthetic data, the same behavior can be seen in the field. The two plots on the left both come from the same performance data measured on a real parallel algorithm. The time response looks pretty close to the desired perfect hyperbolic curve, but the rate response reveals that this algorithm actually is scaling very poorly. See Ken's paper for many more details and examples on the subject.

One Vaccine, Two Stories

Once again, a contentious issue brought up a vis lie as Ken next presented some data on the effectiveness of the measles vaccine. He presented two groups presenting data on measles cases in subtly different ways but providing very different stories.

Here is a chart posted to the Wikipedia page on the measles vaccine. It clearly shows a sharp drop in the measles in 1964, right when the vaccine was widely administered in the united states. Looking at this plot, you would conclude that the use of the vaccine nearly eradicated the illness in the united states.

But now consider this chart originally posted on a Health Sentinel web site. This chart shows that the measles was on its way out well before the vaccine was introduced in 1964. This chart suggests that the actual effect of the vaccine was minuscule.

Both of these charts show real data yet tell opposite stories. At least one must be lying, but which one?

Before we definitively answer this question, let's take a look at the differences between the two plots. On close inspection, there are actually several subtle differences that might skew the data. First, the Wikipedia chart is showing the number of reported cases whereas the Health Sentinel chart is showing the number of reported deaths caused by measles. Second, the Wikipedia chart shows an absolute number whereas the Health Sentinel chart shows data per capita. Third, the Wikipedia chart shows a fairly narrow range of data (starting in 1944) whereas the Health Sentinel chart shows a much larger range of data (starting in 1900). Any one of these can confound the visual display, so which is lying most?

Fortunately, the confusion can be cleared up with this chart that was originally published in "Measles mortality: a retrospective look at the vaccine era," by Am Epidemiol in 1975. This chart shows both the number of cases and the number of deaths by measles. It shows the trends of both previous charts. The number of cases holds constant whereas the number of deaths started dropping in the early 20th century. As explained in a blog post by Todd W., most deaths from measles occurred from secondary infections, which are more likely to kill the patient if antibiotics are not administered. Antibiotics were invented in the early 20th century and lead to the drop in measles deaths. In contrast, vaccines either prevent the initial infection or they don't. Once you catch the disease, the vaccine does no good. Thus, the number of cases is a much better metric for the effectiveness of a vaccine than the number of deaths. Thus we can conclude that the Health Sentinel visualization is lying, and the Wikipedia visualization is not.

Plotting Non-Finite Values

Pat Crossno presented an issue on the behalf of Tim Shead. Unlike most of the other examples shown, this proved to be an open question on the proper way to represent non-finite values in plots, which is a design question Tim faces in the implementation of Toyplot.

At left is a standard plot of some (fabricated) data. The finite values and their trends are clearly visible. But what is happening at the ends of the plots where the curves end?

In between the data points are other measurements. Some are infinites whereas other are not-a-number (NaN). But where are there and what is there meaning.

To rectify the situation, Pat and Tim presented several potential alternate designs. In this first one, the ends of the plots are extended upward with an arrow to signify that they continue on to infinity (further clarified with an infinity symbol). This helps distinguish these trends toward infinity from simple stops in the data or from finite values that might extend past the bounds of the plot due to axis scaling.

However, there are problems with this representation. It was quickly pointed out that the slopes of the arrow segments is generally wrong if interpolating to infinity. Another issue is that it tells nothing about any samples that might be in this gap between finite values.

This next proposed design tries to rectify the issue of representing unknown values by adding an infinity line where infinity values are plotted. This, however, was not well received. It was generally agreed that the infinity line is visually confusing as it misrepresents infinity to part of the finite space.

This final proposed design uses a slightly different representation for the plot, a filled region rather than a curve. However, the filled region does provide a better mechanism for visually representing the transition to infinity via the gradient to the background color as well as show how far the data extend out in infinity.

That said, there was some confusion over why there was a gap in the plot. The reason is that this region contained NaN values, which raises the question of why those are not represented. Whether they are represented or what the representation should be is dependent on what the NaNs mean. Perhaps it signifies a compute error, in which case they are important. Or perhaps they simply signify that no value is present (for example, when constructing a pivot, and the value for a particular cell does not exist), in which case leaving them blank makes sense.

In response, the Vis Lies audience brainstormed on more possible representations for non-finite values in a series plot. Here is the mock-up drawn by Jonathan Knispel on one of the note papers provided by the hotel.

We declared this to be Vis Lies' first official poster (albeit a small one).

20 Years of Blunders

Roy Ruddle started by reviewing the paper Top Ten Blunders by Visual Designers, which was published 20 years ago. In quick summary, this paper identifies the following 10 blunders in visual display.

With these blunders in mind, Roy played a game with us. He divided the room into three groups and challenged us to identify the vis blunder on several example images. It was an energetic and fun competition. Scoring was difficult, however, as the room was full of liars. Georges claimed the high score of the first to identify 7 out of 5 blunders first. Unfortunately for Georges, he didn't write this blog. The official winner was Ken's team with a score of 35.2 out of 5.

True Bars that Tell Lies

Anthony Selino presented an example of a vis lie that can have real consequences on policy decisions, which were originally reported in Garr Reynolds' blog.

The image at left was presented by Ted Cruz during a hearing on the President's FY2016 budget request. Cruz used this bar chart to make the argument that NASA is allocating a disproportionately large amount of funding to earth science research, which Cruz argued would be more appropriate in other agencies. According to this graphic, NASA is spending a disproportionate amount of money on inward focused earth science and not enough on its core mission of outward focused space exploration.

But are these conclusions true? Although the numbers used for the visualization are correct, they are very misleading. Note that the numbers presented are percentage change. They have little correlation to the total amount spent or proportion of the budget. An item that starts with very little funding can vary wildly as a percentage of change without making a significant difference in the total budget.

Here is a more honest representation of how NASA spends its budget. As you can see, the vast majority of funding is spent on its core mission of space exploration. The total amount spent of earth science is significant, but is not the disproportionately large allocation implied by the previous graphic.

Force-Directed Lies

Dustin Arendt reminded us to be skeptical of pretty much any visual representation of graphs. Perhaps the most common algorithm for determining the position of elements in a graph is a force-directed layout. A force-directed layout works by designating forces between nodes in a graph (attractive forces for connected, repulsive forces for unconnected). The algorithm simulates the movement of these nodes based on these imparted forces to find a steady (or semi-steady) state. The idea, of course, is that at rest the graph would place connected nodes close to each other and unconnected nodes further away.

This turns out to be not the case in even the simplest of graphs. Consider, for example, the small, uncomplicated graph to the left. It should take little effort to find a nice steady state for a mere 35 nodes, and you would expect the physical locations to well represent the respective relationships among the nodes.

However, this turns out to not be the case. It turns out that the correlation between connected nodes and physical location is not as strong as you would hope.

Here at left is the exact same graph with the same layout. In this image Dustin identifies every instance of when a node's closest neighbor is not directly connected to it with red lines. As you can see, there are many examples where nearby nodes are in fact not directly related to each other, suggesting that, in fact, force directed layouts are likely performing a disservice when presenting graph structure, even when you don't get a hairball.

Confusing Colors

Maria Zemankova revived some slides from many years ago revealing some rather poor color choices. Good color makes data visible, but bad colors can obscure data. First is an example of a pair of visualizations, which are used together, that use inconsistent colors in a very confusing way.

These two visualizations at left (supposedly) show related trends. Taken alone, the colors are not good, but not terrible. By why do the colors shift around a bit from one visualization to another? And why, in the B visualization, is the same color used for a slight positive trend and no data?

Of course, colors don't have to be inconsistent to be bad. It just wouldn't be Vis Lies if no one brought up the rainbow color map. Here, these color qualify as they were clearly, naively pulled from rainbow colors. But to make a bad color map even worse, the colors are reversed from their natural perceptive orientation. Cool dark blues, which are supposed to signify maximum value, feel muted whereas the red color of low values pop out more than any color.

Two Points = Trend?

We had a field day with line charts this year. This chart at left was recently presented to the US congress in an argument to cut federal funding to the Planned Parenthood organization. The chart shows a dramatic shift of funds from lifesaving cancer screening and prevention to abortions.

However, a closer inspection reveals that this chart is a big fat liar. The scales of the trend lines are all off, and in fact the amount spent in the respective services is not close to each other.

Here is a (somewhat) corrected representation of the data in this chart. When presented correctly, it is clear that the amount spent by Planned Parenthood on abortions is far less than there other services. It is also clear that the amount of funding increase for abortions is greatly exaggerated in the previous chart.

Most likely, previous funding cuts forced Planned Parenthood to reduce the prevention services they spend the majority of their funds on. Ironically, those funding cuts are now used as the reason for more funding cuts.

But the problems with this visualization don't stop there. I can't help but notice that each trend is derived from only two data points (taken at 2006 and 2013). Why only these two dates? Surely the data for the intervening years is also available. Is it really a steady decline, or was there a sharp drop? Has the funding recently dropped or has it stabilized?

Generally, you should be very skeptical whenever a trend line is created with such little data. Drawing a trend line between any two single values is sure to elicit an unsupported conclusion. Take for example the plot at left. As far as I can tell, the numbers come from exactly two samples taken from a survey of computer users for a particular Linguistics Seminar at Penn State (about 20 students each). From these data we might assume that by now Apple is burying surplus electronics in the New Mexico desert.

To be fair, this plot is just an example to show graphics elements of chart, not trying to make a point of the data. Nevertheless, any conclusion drawn from this plot is false.

The problem with drawing a trend from only two sample points, is that it is easy to draw any conclusion you like simply by picking the two "right" points (either intentionally or not). This can even lead to humorously malformed conclusions such as the graph on the left demonstrating the conclusion on the right.