Change scatter plots

I never read Edward Tufte‘s book The Visual Display of Quantitative Information before. (I have a lot of practice but almost no training in visual presentation of data.)

How do you describe the change in one variable between two points in time? Here’s an example of a “slopegraph” of the kind Tufte likes (many examples here). He takes a list of 15 countries’ government receipts as percentage of GDP for 1970 and 1979, and produces this simple graph:

tufteexample

He likes it because all the ink is data (he’s inexplicably invested in the conservation of ink). And he likes how it’s easy to see the change for each country, as well as the two ranked lists for each time point, and those with unusual changes, such as Britain, the only country with a decline. Those are strengths, and this kind of graph is often great. An alternative is a change scatter plot. Here it is with the same data:

tuftestataIn this you can see the overall upward movement (points over the red line), and specifics such as the three countries that moved as a group from 40-50 percent range to the 50-60 percent range. It also allows a vertical reading, to make comparisons between countries that started the 1970s similarly, such as Switzerland and Greece, Italy and the US, Belgium and Canada — to see how they diverged, with Switzerland, Italy, and Belgium all moving up more during the decade.

I’ve used it in a few cases before, like this graph on changes in marriage rates across 26 countries:

ipums-international-marriage2

I think the scatter plot approach is especially helpful when you want to see how the change differs at different points in a distribution, or when there are lots of data points.

In a figure from this paper on gender segregation among managers we used it to show how the pace of women’s advance into managerial occupations stalled in the 1990s, by overlaying changes from two time periods on the same figure:

wo-scatter

The fact that these lines are essentially parallel is useful and clearly shown. You could make this graph as a slopegraph with three columns, showing two changes, but I don’t think you’d see the pattern as well.

Here’s one I made for something else but haven’t used yet, showing the decline of manufacturing in 50 large metro areas over three decades. In this one they’re all compared with 1980, creating vertical columns of white, gray and black dots over each MA’s 1980 starting point.

ma-manufacturing

Tufte would call all that white space above the diagonal a big waste.

In the Tufte example above there aren’t many cases so you could label them all. In my marriage example you can figure out the countries based on short abbreviations because the names are familiar. And in the managerial occupations or metro areas it’s the shape of the cloud that matters, so it’s OK not to label them.

Here is an example with a lot of cases, each of which is labeled, from an op-ed by Stephanie Coontz in the New York Times, showing the change in the gender composition of occupations from 1980 to 2010. This one adds a categorical scheme that is supposed to make the types of changes more easily discernible. So those in the top gray box are female-dominated, those in the bottom gray box are male-dominated, and those in the middle are integrated. Green lines denote occupations that entered the integrated zone; red lines denote occupations that became more segregated.

30coontz-gr1-popup-v2This has a lot of information, but it doesn’t do much more for me than a table would. And the categorical color scheme hides a number of occupations that changed a lot but remained within the arbitrary categories (gray lines). By converting it to a change scatter plot, you can get a sense of the overall pattern of change, and still isolate those with big changes. In the version here I’ve only tagged the ones that changed 20 percentage points or more, so a lot of information is lost, but the graph is a lot smaller, so you could afford to add some text with additional detail.

tufte-nyt

Here you quickly see that most occupations became more female. And there is a clump of occupations that changed a lot but remained in the middle-range category — medical, education, and human resource managers, and accountants. These were grayed out in the Times version, but they integrated dramatically so you should notice them.

This might not be the best example, but I like this method of showing within-case changes over time.

1 Comment

Filed under Me @ work

One response to “Change scatter plots

  1. I will confess to never actually having read more than a little Tufte either, and having more experience than training in graphical representation. I generally like the scatterplots better too — but it seems like these are better suited when you’re interested in patterns across cases rather than the cases as cases. So in the occupations graphs if you were interested in patterns of change in segregation, the scatterplot would be better. But if you wanted to know what happened with editors and reporters the slopegraph is better (but the table on which it’s based is better than that because it can show what was happening in 1990 and 2000 too).

    Some of it has to come down to audience. As sociologists I think we’re probably more likely to be interested in the pattern — but a typical reader of the NY Times would probably want to look at trends within occupations (looking first at their own and then at others that interested them…).

    Like

Comments welcome (may be moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s