Category Archives: Me @ work

How well do teen test scores predict adult income?

Now with new figures and notes added at the end!

The short answer is, pretty well. But that’s not really the point.

In a previous post I complained about various ways of collapsing data before plotting it. Although this is useful at times, and inevitable to varying degrees, the main danger is the risk of inflating how strong an effect seems. So that’s the point about teen test scores and adult income.

If someone told you that the test scores people get in their late teens were highly correlated with their incomes later in life, you probably wouldn’t be surprised. If I said the correlation was .35, on a scale of 0 to 1, that would seem like a strong relationship. And it is. That’s what I got using the National Longitudinal Survey of Youth. I compared the Armed Forces Qualifying Test scores, taken in 1999, when the respondents were ages 15-19 with their household income in 2011, when they were 27-31.*

Here is the linear fit between between these two measures, with the 95% confidence interval shaded, showing just how confident we can be in this incredibly strong relationship:

afqt-linear

That’s definitely enough for a screaming headline, “How your kids’ test scores tell you whether they will be rich or poor.” And it is a very strong relationship – that correlation of .35 means AFQT explains 12% of the variation in household income.

But take heart, ye parents in the age of uncertainty: 12% of the variation leaves a lot left over. This variable can’t account for how creative your children are, how sociable, how attractive, how driven, how entitled, how connected, or how White they may be. To get a sense of all the other things that matter, here is the same data, with the same regression line, but now with all 5,248 individual points plotted as well (which means we have to rescale the y-axis):

afqt-scatter

Each dot is a person’s life — or two aspects of it, anyway — with the virtually infinite sources of variability that make up the wonder of social existence. All of a sudden that strong relationship doesn’t feel like something you can bank on with any given individual. Yes, there are very few people from the bottom of the test-score distribution who are now in the richest households (those clipped by the survey’s topcode and pegged at 3 on my scale), and hardly anyone from the top of the test-score distribution who is now completely broke.

But I would guess that for most kids a better predictor of future income would be spending an hour interviewing their parents and high school teachers, or spending a day getting to know them as a teenager. But that’s just a guess (and that’s an inefficient way to capture large-scale patterns).

I’m not here to argue about how much various measures matter for future income, or whether there is such a thing as general intelligence, or how heritable it is (my opinion is that a test such as this, at this age, measures what people have learned much more than a disposition toward learning inherent at birth). I just want to give a visual example of how even a very strong relationship in social science usually represents a very messy reality.

Post-publication addendums

1. Prediction intervals

I probably first wrote about this difference between the slope and the variation around the slope two years ago, in a futile argument against the use of second-person headlines such as “Homophobic? Maybe You’re Gay.” Those headlines always try to turn research into personal advice, and are almost always wrong.

Carter Butts, in personal correspondence, offered an explanation that helps make this clear. The “you” type headline presents a situation in which you – the reader — are offered the chance to add yourself to the study. In that case, your outcome (the “new response” in his note) is determined by the both the line and the variation around the line. Carter writes:

the prediction interval for a new response has to take into account not only the (predicted) expectation, but also the (predicted) variation around that expectation. A typical example is attached; I generated simulated data (N=1000) via the indicated formula, and then just regressed y on x. As you’d expect, the confidence bands (red) are quite narrow, but the prediction bands (green) are large – in the true model, they would have a total width of approximately 1, and the estimated model is quite close to that. Your post nicely illustrated that the precision with which we can estimate a mean effect is not equivalent to the variation accounted for by that mean effect; a complementary observation is that the precision with which we can estimate a mean effect is not equivalent to the accuracy with which we can predict a new observation. Nothing deep about that … just the practical points that (1) when people are looking at an interval, they need to be wary of whether it is a confidence interval or a prediction interval; and (2) prediction interval can (and often should be) wide, even if the model is “good” in the sense of being well-estimated.

And here is his figure. “You” are very likely to be between the green lines, but not so likely to be between the red ones.

CarterButtsPredictionInterval

2. Random other variables

I didn’t get into the substantive issues, which are outside my expertise. However, one suggestion I got was interesting: What about happiness? Without endorsing the concept of “life satisfaction” as measured by a single question, I still think this is a nice addition because it underscores the point of wide variation in how this relationship between test scores and income might be experienced.

So here is the same figure, but with the individuals coded according to how they answered the following question in 2008, when they were age 24-28, “All things considered, how satisfied are you with your life as a whole these days? Please give me an answer from 1 to 10, where 1 means extremely dissatisfied and 10 means extremely satisfied.” In the figure, Blue is least satisfied (1-6; 21%), Orange is moderately satisfied (7-8; 46%), and Green is most satisfied (9-10; 32%)

afqt-scatter-satisfied

Even if you squint you probably can’t discern the pattern. Life satisfaction is positively correlated with income at .16, and less so with test scores (.07). Again, significant correlation — not helpful for planning your life.

* I actually used something similar to AFQT: the variable ASVAB, which combines tests of mathematical knowledge, arithmetic reasoning, word knowledge, and paragraph comprehension, and scales them from 0 to 100. For household income, I used a measure of household income relative to the poverty line (adjusted for household size), plus one, and transformed by natural log. I used household income because some good test-takers might marry someone with a high income, or have fewer people in their households — good decisions if your goal is maximizing household income per person.

3 Comments

Filed under Me @ work

What do doctors, lawyers, police, and librarians Google?

Now with college teachers!

What do doctors, lawyers, police, and librarians Google? I’ll tell you. But first — if you are going to take this too seriously, please stop now.

Data and Method

Using IPUMS to extract data from the 2010-2012 American Community Survey, I count the number of people ages 25-64, currently employed, in a given occupation. I divide that by each state’s population in that age range (excluding Washington DC from all analyses). I enter those numbers into the Google Correlate tool to see which searches are most highly correlated with the distribution of each occupation across states (the tool reports the top 100 most correlated searches). In other words, these are searches that maximize the difference between, for example, high-lawyer and low-lawyer states — searches that are relatively popular where there are a lot of lawyers, and relatively unpopular where there are not a lot of lawyers.

Is this what lawyers actually Google? We can’t know. But I think so. Or maybe what people who work in law firms do, or people who live with lawyers. It’s a very sensitive tool. I made this case first in the post, Stuff White People Google. Check that out if you’re skeptical.

For each occupation, I first offer a few highly correlated searches that support the idea that the data are capturing what these people search for. Then I list some of the interesting other hits from each list.

Results

Police

Police per adult

Police per adult

The map of police per adult looks pretty random, but the list of correlated search terms doesn’t. On the list are “security training,” “tsa jobs,” “waist belt,” “weight vest,” and “air marshals.”

After all the security stuff, the only major category left in the 100 searches most correlated with police in the population is women. Specifically, their search taste includes tough actress Rachel Ticotin, body builder Denise Masino, Brazilian actress Alice Braga, actress Rosario Dawson, and, “israeli women.” (Remember, Google suppresses known porn terms, so this is just what got through the filter.) It’s a leap from this data to the statement, “police search for images of these women,” but this is who they would find if that were the case (is this a “type”?):

policewomensearches

Librarians

Librarians per adult

Librarians per adult

On the other hand, librarians. They are the smallest occupation I tried: the average state population aged 25-64 is only one tenth of one percent librarians. Yet, their distribution leaves an unmistakable trace in the Google search patterns. It especially seems to pick up terms associated with public libraries. Correlated terms include, “cataloguing,” and “quiet hours.” And then there are terms one might ask a librarian about, classic reference-desk questions such as, “which vs that,” “turn off track changes,” “think tanks,” “9/11 commission,” and “irs form 6251″; and term paper topics like Shakespeare titles or “human development report.”

What about the librarians themselves, or those close to them? Could it be they who are searching for Ann Taylor dresses, Garnet Hill free shipping, Lands End home, and textile museums? We can’t know for sure. Of course, if anyone knows how to cover their search tracks, it might be this crowd.

Doctors

Doctors per adult

Doctors per adult

You know they’re doctors, because the search terms most correlated the map include “md, mph,” “md, phd,” “nejm,” “journal medicine,” “tedmed,” and “groopman.” What else do they like? Chic Corea, Tina Fey, Larry David, Mad Men (season 1) and The West Wing, Laura Linney, John Oliver, Scrabble 2-letter words, and a bunch of Jewish stuff.

Lawyers

Lawyers per adult

Lawyers per adult

That’s the map of lawyers per adult across states. Is it really lawyers? The top 100 searches correlated with the distribution shown above include “general counsel,” and then a lot of financial terms like, “world economic forum,” “international finance corporation,” and “economist intelligence.” Then there are international travel terms, like, “rate euro dollar,” “royal air,” and “swiss embassy.”

Looks like lawyers in lawyer-land are richer and more finance-oriented than lawyers in general. On the cultural side, they search for clothing terms Massimo Dutti, Hugo Boss, and Benetton. They apparently like to eat at Zafferano in London, and drink Caipirinhas. Also, they like “vissi,” which is an aria from Tosca but also a Cypriot celebrity; I lean toward the latter, because Queen Rania is also on the list. Finally, they combine their interests in law, finance, and wealthy attractive women by searching for Debrahlee Lorenzana, the “too-hot-for-work” banker.

By popular demand: Post-secondary teachers

postsecondaryperadult

Finally, here without comment are the results for “post-secondary teachers,” which includes any college teacher who didn’t instead specify a specialty, such as “psychologist” or “economist.” (It’s hard to see on the map, but Rhode Island is the highest.) I broke the results into four rough categories:

Academic

attribution
balderdash
bmi index
body image
citation style
cpdl
critical theory
debt to equity
debt to equity ratio
democracy in america
dihedral
economic inequality
economic statistics
economists
educause
edward elgar
effect size
email forward
equals sign
exogenous
feminists
google scholar
growth rates
homomorphism
inflation rate
inflation rates
intelligibility
international study
isomorphic
journal of
journal of nutrition
marginal propensity
marginal propensity to consume
mediating
meters per second
milieu
overlaying
piano sonata
prefrontal
prefrontal cortex
profile of
psychology studies
quick ratio
rejection letter
returns to scale
routledge
scholar
subgroup
superscript
transglutaminase
ways to end a letter

Personal

1% milk
2006 olympics
best pump up songs
crib safety
easy halloween costume
graco snug
handel
ipod history
jackson superbowl
janet jackson superbowl
mastermind game
maxim online
minesweeper
most popular names
napping
national sleep foundation
olympic figure skating
olympics 2006
pairs figure skating
positioning
refereeing
sandra boynton
senior hockey
snl clips
stuff magazine
stumbled upon
toilet training
verum

Musical

1812 overture
acapella group
acapella groups
africa toto
ave verum
for the longest time
it breaks my heart
pdq bach
taylor swift

Birth control

apri
apri birth control
aviane

Conclusions

Poor social scientists, generations of them spending their lives raising a few thousand dollars to ask a few thousand people a few hundred stilted, arbitrary survey questions. Meanwhile, coursing through the cable wires below their feet, and through the air around them, billions of data bits carry so much more potential information about so many more people, in so many intimate aspects of their lives, then we could even dream of getting our hands on. Just think of the power!

RingfrodoNote: I’ve done many posts like this. Some use time series instead of geographic variation, some use terms from Google Books ngrams. Browse the series under the Google tag, or check out this selection:

 

 

2 Comments

Filed under Me @ work

Family Inequality wins Charm Quark

I’m pleased to report that the blog has been awarded the Charm Quark, which is third place in the Politics and Social Science category for 2014, from 3 Quarks Daily.

charmquark

The write-up for the award is here. The judge was Mark Blyth, and the post he read was my debunking of the State of Utah’s claim that banning same-sex marriage would make it more likely for kids to be raised by straight married parents. Blyth put my post in the category of “Bullshit Police,” writing:

If social science has a public function this is it. Theory generation and hypothesis testing and all that grad school stuff is all fine and well, but at the end of the day the job is to take the claims of those that want us to think X is Y and sniff it to see if its bullshit. … the winner in this pot is Philip Cohen for his Family Inequality piece on the state of Utah and same sex parenting. Take a causal argument. Test it. Test it again. Pronounce it bullshit. Move along. Move along. Fantastic stuff and first class ‘bullshit police’ work.

It’s very nice to have my work recognized this way. It’s especially gratifying that it was a piece that included original data analysis (and even fixed-effects regressions). I hope I did it right!

3 Quarks is a filter blog that presents posts on “science, design, literature, current affairs, art, and anything else we deem inherently fascinating” six days a week, and original pieces on Mondays. I hope you will visit the site and see what they have to offer.

Thanks to Mark Blyth and the 3 Quarks folks for the boost.

3 Comments

Filed under Me @ work

Peak women, labor force participation edition

I had a great visit at the University of Pennsylvania the other day, and gave a talk titled, “What Happened to the Gender Revolution?” It was an elaboration of the op-ed I wrote last fall, in which I sketched out the stall in progress toward gender equality (a recurring theme, not my discovery) and offered some ideas about getting it moving again.

One objection I got during the talk (rather belligerently, from Herbert Smith) was that I was making a big deal out of women’s labor force share peaking at just under half the total, which is a natural place to peak and so we shouldn’t expect it to keep going up.

peak-woman

My first response was that the feminism-has-gone-too-far gang (Hanna Rosin, Kay Hymowitz, Christina Hoff Sommers, etc.) complains as if women’s progress has already shot past 50/50. Although it hasn’t on almost all measures, there’s also no reason why women couldn’t become dominant. Judging from history, one gender dominating the labor market is hardly an impossibility. So women’s labor force share tapering off as it approaches 50% shouldn’t be considered a natural phenomenon.

But second, and for this I blame my presentation, women’s share of the labor force isn’t the best measure because it depends also on men’s labor force participation, too, which has been falling since the 1960s. So maybe it’s best to focus on women’s participation rates instead (it is on this measure that the U.S. has slipped behind many other rich countries).

Here are the labor force participation rates for women by age, education, race/ethnicity, and marital status, from 1962 to 2013, from the Current Population Survey, with men for comparison. The dots show the peak year for each trend (click to enlarge).

wlfp

Women’s overall share of the labor force hit 46% in 1994, and has spent the last 20 years within a point of that (as both men’s and women’s rates fell). But if you look at all these groups it’s clear that doesn’t represent the simple slide of women into the home plate of equality. Every line here rose for decades before hitting a peak between 1996 and 2001. And they peaked at different levels: Women with BA degrees peaked at 85%, Black women peaked at 80%, Hispanic women peaked at 68%. Married women peaked at 75%, single women at 82%. And so on.

Maybe all these trends are not being driven by the same underlying forces. But I’m pretty sure it’s not a complete coincidence.

2 Comments

Filed under Me @ work

What’s in a ratio? Teen birth and marriage edition

Even in our post-apocalypse world, births and marriages are still related, somehow.

Some teenage women get married, and some have babies. Are they the same women? First the relationship between the two across states, then a puzzle.

In the years 2008-2012 combined, 2.5 percent of women ages 15-19 per year had a baby, and 1 percent got married. That is, they were reported in the American Community Survey (IPUMS) to have given birth, or gotten married, in the 12 months before they were surveyed. Here’s the relationship between those two rates across states:

teenbirthmarriage1The teen birth rate  ranges from a low of 1.2 percent in New Hampshire to 4.4 percent in New Mexico. The teen marriage rate ranges from .13 percent in Vermont to 2.3 percent in Idaho.

But how much of these weddings are “shotgun weddings” — those where the marriage takes place after the pregnancy begins? And how many of these births are “gungo-ho marriages” — those where the pregnancy follows immediately after the marriage? (OK, I made that term up.) The ACS, which is wonderful for having these questions, is somewhat maddening in not nailing down the timing more precisely. “In the past 12 months” is all you get.

Here is the relationship between two ratios. The x-axis is percentage of teens who got married who also had a birth (birth/marriage). On the y-axis is the percent of teens who had a birth who also got married (marriage/birth).

teenbirthmarriageIf you can figure out how to interpret these numbers, and the difference between them within states, please post your answer in the comments.

 

 

 

5 Comments

Filed under Me @ work

Open thread on the way some people, right, sort of really talk these days

Speaking extemporaneously in public is difficult. Since I’ve been on radio and TV a few times, and then reviewed the tapes afterward, I’ve developed my own internal criticism (drowning out that critic’s voice is sometimes difficult even while I’m talking). And I’ve also become even more aware of how people talk, to the point of speaking back lines I hear, trying out alternative expressions, and generally driving myself nuts.

Anyway, all that “really, sort of, right,” seems to be ascending toward some kind of peak. I heard this passage on the radio recently (no need to identify the speaker, is there?), and had to jot it down. The discussion was about Google and other tech workers and their buses to San Francisco. That’s enough context:

Look, I think, I mean, so all the data suggests, right, from the recent Census in the last two years, that obviously that center city areas are growing faster than suburban areas. But I think what’s actually interesting that’s happening, when you start to think about the city/suburbs divide, is really what we’re starting to see is are cities and suburbs become more and more alike. And that is to say that cities are having to deal with a lot of the issues that suburban areas have dealt with for a long time, right: crime, density, housing, all those issues. And now I think what we’re starting to see is suburbs, for instance, having to think about themselves becoming more attractive to folks who are looking for this urban lifestyle. So you’re starting to see suburban areas really focus on this idea of creative place-making: how do you really create a unique, authentic place, where people want to live. I think the other interesting thing is for suburbs is that they’re connected on transit, right – this idea of transit-oriented development is really important – how can they be connected to the city in terms of becoming a really sort of key node here. And so, you know, I think what we’re seeing, again, is this sort of shift, right, is what we call sort of this blending, of both cities and suburbs. You know, and just for a second to go back to the point about sort of young people and sort of being – not thinking about community as much – I think what’s interesting is you sort of see this shift of technology workers, back to city centers. What’s interesting is that a lot of technology workers are wanting to live in city centers because they want to have access to a unique, diverse community, they want to be engaged in their communities, so you see more of them taking public transit, you see more of them sharing resources. So it is about I think this sort of you know, it is perhaps a different perspective, but it is about sort of this engagement that we’re starting to see among young technology workers, Millennials, Creatives, etc., that are really going to sort of not be the problem for our cities, but really help us think about the solutions and what’s sort of to try to fix those issues.

Without picking on individuals (too late), any thoughts?

17 Comments

Filed under Me @ work

Why are only 29% of NYTimes.com front page authors women?

In December I picked a moment to audit the gender composition of authors at the New York Times and Washington Post websites. Not many were women. Here’s a follow-up with more data.

For some context, according to the American Community Survey (IPUMS data extraction tool), there were about 55,000 “News Analysts, Reporters and Correspondents” working full-time, year-round in 2012. Of those, 41% were women. This pool of news writers is small compared with the number FTYR workers who report their college major was in journalism: about 315,000, of whom 53% are women. Lots of journalism majors work in other careers; lots of news writers weren’t journalism majors.

So, how will the premier newspaper in the country compare?

Methods

I stuck with NYTimes.com, and checked the gender composition of the bylines that appeared on the front page of the website just about every day between January 8 (the first day of their website redesign) and February 9, for 26 observations over 32 days. I checked whenever I thought of it, aiming for once a day and never more than once per calendar day. I excluded those in the “most-emailed” or “recommended for you” lists. I included Op-Eds and Opinion columnists if they were named (e.g., “Friedman: Israel’s Big Question”) but not if they weren’t (e.g., “Op-Ed Contributor: Czar Vladimir’s Illusions”). On average there were 16 bylines on the front page.

Someone — looking at you, Neal Caren — could scrape the site for all bylines, but in the absence of that I figured a simple rule was best. To check the gender of authors, I used my personal knowledge of common names, and when I wasn’t sure Googled the author’s photo and eyeballed it (all the authors I checked had a photo easily accessible). Overall, I counted 421 named authors (including duplicates, as when the same story was on the front page twice or the same author wrote again on a different day).

Results

Twenty-nine percent of the named authors were women (124 / 421). Women outnumbered men once (8-to-6), on February 8 at 2:35 AM. At the most extreme, men outnumbered women 18-to-1, at 8:12 AM on January 14.

Here are the details:

nytimes percent female authors.xlsx

Discussion

The New York Times is just one newspaper, and one employer, but it matters a lot, and the gender composition of the writers featured there is important. According to Alexa, NYTimes.com is the 34th most popular website in the U.S., and the 119th most popular in the world — and the most popular website of a printed newspaper in the U.S. In the JSTOR database of academic scholarship, “New York Times” appeared in 117,683 items in January 2014, 3.7-times more frequently than the next most-common newspaper, the Washington Post.

I don’t know the overall composition of New York Times writers, or their pool of applicants, or the process by which articles are selected for the website front page, so I can’t comment on how they end up with a lower female composition on the website than the national average for this occupation.

However, it is interesting to hold this up to the organizational research on how organization size and visibility affect gender inequality. Analyzing data from almost 300,000 workplaces over three decades, Matt Huffman, Jessica Pearlman and I found strong evidence that larger establishments are less gender segregated. To explain that, we wrote (with references removed for brevity):

Institutional research on organizational legitimacy implies that size promotes gender integration within establishments, because size increases both visibility to the public and government regulatory agencies and pressure to conform to societal expectations. Size is positively correlated with the formalization of personnel policies and other practices, and formalization is thought to reduce gender-based ascription by limiting managers’ discretion and subjectivity and holding decision makers accountable for their decisions.

The New York Times certainly is a high-visibility corporation, and the effects of its staffing practices are splashed all over its products through bylines and the masthead. In fact, maybe that visibility is to thank for the integration it has accomplished already. Of course it’s complicated; we also found that the gender of managers, firm growth, and other factors affect gender integration. Maybe to help figure this out someone should repeat this count over a longer time period to see how it’s changed, and how those changes correspond with other characteristics of the company and its social context.

7 Comments

Filed under In the news, Me @ work

Divorce drop and rebound: paper in the news

My paper on divorce and the recession has been accepted by the journal Population Research and Policy Review, and Emily Alpert Reyes wrote it up for the L.A. Times today. The paper is online in the Maryland Population Research Center working paper collection.

latimes-divorce

Married couples promise to stick together for better or worse. But as the economy started to rebound, so did the divorce rate.

Divorces plunged when the recession struck and slowly started to rise as the recovery began, according to a study to be published in Population Research and Policy Review.

From 2009 to 2011, about 150,000 fewer divorces occurred than would otherwise have been expected, University of Maryland sociologist Philip N. Cohen estimated. Across the country, the divorce rate among married women dropped from 2.09% to 1.95% from 2008 to 2009, then crept back up to 1.98% in both 2010 and 2011.

To reach the figure of 150,000 fewer divorces, I estimated a model of divorce odds based on 2008 data (the first year the American Community Survey asked about divorce events). Based on age, education, marital duration, number of times married, race/ethnicity and nativity, I predicted how many divorces there would have been in the subsequent years if only the population composition changed. Then I compared that predicted trend with what the survey actually observed. This comparison showed about 150,000 fewer than expected over the years 2009-2011:

divorce-fig2

Notice that the divorce rate was expected to decline based only on changes in the population, such as increasing education and age. That means you can’t simply attribute any drop in divorce to the recession — the question is whether the pace of decline changed.

Further, the interpretation that this pattern was driven by the recession is tempered by my analysis of state variations, which showed that states’ unemployment rates were not statistically associated with the odds of divorce when individual factors were controlled. Foreclosure rates were associated with higher divorce rates, but this didn’t hold up with state fixed effects.

So I’m cautious about the attributing the trend to the recession. Unfortunately, this all happened after only one year of ACS divorce data collection, which introduced a totally different method of measuring divorce rates, which is basically not comparable to the divorce statistics compiled by the National Center for Health Statistics from state-reported divorce decrees.

Finally, in a supplemental analysis, I tested whether unemployment and foreclosures were associated with divorce odds differently according to education level. This showed unemployment increasing the education gap in divorce, and foreclosures decreasing it:

Microsoft Word - Divorce PRPR-revision-revision.docx

Because I didn’t have data on the individuals’ unemployment or foreclosure experience, I didn’t read too much into it, but left it in the paper to spur further research.

Aside: This took me a few years.

It started when I felt compelled to debunk Brad Wilcox’s fatuous and deliberately misleading interpretation of divorce trends — silver lining! – at the start of the recession, which he followed up with an even worse piece of conservative-foundation bait. Unburdened by the desire to know the facts, and the burdens of peer review, he wrote in 2009:

judging by divorce trends, many couples appear to be developing a new appreciation for the economic and social support that marriage can provide in tough times. Thus, one piece of good news emerging from the last two years is that marital stability is up.

That was my introduction to his unique brand of incompetence (he was wrong) and dishonesty (note use of “Thus,” to imply a causal connection where none has been demonstrated), which revealed itself most egregiously during the Regenerus affair (the full catalog is under this tag). Still, people publish his un-reviewed nonsense, and the American Enterprise Institute has named him a visiting scholar. If they know this record, they are unscrupulous; if they don’t, they are oblivious. I keep mentioning it to help differentiate those two mechanisms.

Check the divorce tag and the recession tag for the work developing all this.

1 Comment

Filed under Me @ work, Research reports

Change scatter plots

I never read Edward Tufte‘s book The Visual Display of Quantitative Information before. (I have a lot of practice but almost no training in visual presentation of data.)

How do you describe the change in one variable between two points in time? Here’s an example of a “slopegraph” of the kind Tufte likes (many examples here). He takes a list of 15 countries’ government receipts as percentage of GDP for 1970 and 1979, and produces this simple graph:

tufteexample

He likes it because all the ink is data (he’s inexplicably invested in the conservation of ink). And he likes how it’s easy to see the change for each country, as well as the two ranked lists for each time point, and those with unusual changes, such as Britain, the only country with a decline. Those are strengths, and this kind of graph is often great. An alternative is a change scatter plot. Here it is with the same data:

tuftestataIn this you can see the overall upward movement (points over the red line), and specifics such as the three countries that moved as a group from 40-50 percent range to the 50-60 percent range. It also allows a vertical reading, to make comparisons between countries that started the 1970s similarly, such as Switzerland and Greece, Italy and the US, Belgium and Canada — to see how they diverged, with Switzerland, Italy, and Belgium all moving up more during the decade.

I’ve used it in a few cases before, like this graph on changes in marriage rates across 26 countries:

ipums-international-marriage2

I think the scatter plot approach is especially helpful when you want to see how the change differs at different points in a distribution, or when there are lots of data points.

In a figure from this paper on gender segregation among managers we used it to show how the pace of women’s advance into managerial occupations stalled in the 1990s, by overlaying changes from two time periods on the same figure:

wo-scatter

The fact that these lines are essentially parallel is useful and clearly shown. You could make this graph as a slopegraph with three columns, showing two changes, but I don’t think you’d see the pattern as well.

Here’s one I made for something else but haven’t used yet, showing the decline of manufacturing in 50 large metro areas over three decades. In this one they’re all compared with 1980, creating vertical columns of white, gray and black dots over each MA’s 1980 starting point.

ma-manufacturing

Tufte would call all that white space above the diagonal a big waste.

In the Tufte example above there aren’t many cases so you could label them all. In my marriage example you can figure out the countries based on short abbreviations because the names are familiar. And in the managerial occupations or metro areas it’s the shape of the cloud that matters, so it’s OK not to label them.

Here is an example with a lot of cases, each of which is labeled, from an op-ed by Stephanie Coontz in the New York Times, showing the change in the gender composition of occupations from 1980 to 2010. This one adds a categorical scheme that is supposed to make the types of changes more easily discernible. So those in the top gray box are female-dominated, those in the bottom gray box are male-dominated, and those in the middle are integrated. Green lines denote occupations that entered the integrated zone; red lines denote occupations that became more segregated.

30coontz-gr1-popup-v2This has a lot of information, but it doesn’t do much more for me than a table would. And the categorical color scheme hides a number of occupations that changed a lot but remained within the arbitrary categories (gray lines). By converting it to a change scatter plot, you can get a sense of the overall pattern of change, and still isolate those with big changes. In the version here I’ve only tagged the ones that changed 20 percentage points or more, so a lot of information is lost, but the graph is a lot smaller, so you could afford to add some text with additional detail.

tufte-nyt

Here you quickly see that most occupations became more female. And there is a clump of occupations that changed a lot but remained in the middle-range category — medical, education, and human resource managers, and accountants. These were grayed out in the Times version, but they integrated dramatically so you should notice them.

This might not be the best example, but I like this method of showing within-case changes over time.

1 Comment

Filed under Me @ work

Academic puffery watch: ‘Utilizing’ edition

If you split hairs, you can argue there is a use for utilize that differentiates it from use. In the Oxford English Dictionary it’s all pretty circular:

  • Utilise: To make or render useful; to convert to use, turn to account.
  • Use: To put to practical use; esp. to make use of in accomplishing a task.

You could get into variants, inflections, and origins. But it’s not worth it. In academic writing I don’t think people do that. I think they use utilize when they are committing puffery (“The action or practice of ‘puffing’ someone or something; extravagant or undeserved praise, esp. for advertising or promotional purposes; writing, etc., intended to have this effect.”)

So it is with heavy heart that I report what could be a comeback for utilize, or at least a stall in the course of its demise. I have this from two sources. First, from the JSTOR academic database:

utilize-coming-back.-jstor

And second, from the general corpus of published material (mostly books) that is in Google Books, using the American English collection for a longer period:utilize-coming-back.

Both show a rise of utilize from obscurity to a peak in the 1970s. Note the peak in academia is about twice as high as the peak in the general collection, at 10.7% compared with 5.3%. But both showed very promising declines until the early 2000s. In retrospect, we see the decline was slowing already in the 1990s. We should have been more vigilant.

Maybe this is just a reversal of progress toward pretending we are above excessive puffery. Which I think is a shame.

This all has something to do with this passage from the chapter titled, “Is a Disinterested Act Possible?” in Practical Reason by Pierre Bourdieu (including the length of the sentence itself):

bourdieu-disinterested

4 Comments

Filed under Me @ work