A lot of qualitative sociology makes comparisons across social class categories. Many researchers build class into their research designs by selecting subjects using broad criteria, most often education level, income level, or occupation. Depending on the set of questions at hand, the class selection categories will vary, focusing on, for example, upbringing and socialization, access to resources, or occupational outlook.
In the absence of a substantive review, here are a few arbitrarily selected examplar books from my areas of research:
This post was inspired by the question Caitlyn Collins asked the other day on Twitter:
#soctwitter, I'd love thoughts on how to operationalize social class categories for sampling in qualitative research. Yes, multidimensional evals with income, ed, occupation.. but I want best practices for actual schema to be used as filters for fieldwork, w people, face to face.
She followed up by saying, “Social class is nebulous, but precision here matters to make meaningful claims. What do we mean when we say we’re talking to poor, working class, middle class, wealthy folks? I’m looking for specific demographic questions, categories, scales sociologists use as screeners.” The thread generated a lot of good ideas.
Income, education, occupation
Screening people for research can be costly and time consuming, so you want to maximize simplicity as well as clarity. So here’s a way of looking at some common screening variables, and what you might get or lose by relying on them in different combinations. This uses the 2018 American Community Survey, provided by IPUMS.org (Stata data file and code here).
I used income, education, and occupation to identify the status of individuals, and generated household class categories by the presence of absence of types of people in each. That means everyone in each household is in the same class category (a choice you might or might not want to make).
Income: Total household income divided by an equivalency scale (for cost of living). The scale counts each adult as 1 person, each child under 18 as .70, and then scales that count by ^.70. I divided the resulting distribution into thirds, so households are in the top, middle, or bottom third. Top third is what I called “middle/upper” class, bottom third is “lower class.”
Education: I use BA degree to identify households that have (middle/upper) or don’t (lower) a four-year college graduate present. This is 31% of adults.
Occupation: I used the 2018 ACS occupation codes, and coded people as middle/upper class if their codes was 10 to 3550, which are management, business, and financial occupations; computer, engineering, and science occupations; education, legal, community service, arts, and media occupations; and healthcare practitioners and technical occupations. It’s pretty close to what we used to call “managerial and professional” occupations. Together, these account for 37% of workers.
So each of these three variables identifies an upper/middle class status of about a third of people.
For lower class status, you can just reverse them. The except is income, which is in three categories. For that, I counted households as lower class if their household income was in the bottom third of the adjusted distribution. In the figures below, that means they’re neither middle/upper class nor lower class if they’re in the middle of the income distribution. This is easily adjusted.
You can make Venn diagrams in Stata using the pvenn2 add-on, which I naturally discovered after making these. If you must know, made these by generating tables in Stata, downloading this free plotter app, entering the values manually, copying the resulting figures into Powerpoint and applying the text there, then printing them to PDF, and extracting the images from PDF using Photoshop. Not recommended workflow.
Here they are. I hope the visuals might help people think about for example, who they might get if they screened on just one of these variables, or how unusual someone is who has a high income or occupation but no BA, and so on. But draw your own conclusions (and feel free to modify the code and follow your own approach). Click to enlarge.
First middle/upper class:
Then lower class:
I said draw your own conclusions, but please don’t draw the conclusion that I think this is the best way to define social class. That’s a whole different question. This is just about simply ways to select people to be research subjects. For other posts on social class, follow this tag, which includes this post about class self identification by income and race/ethnicity.
Here’s an update of a post I wrote two years ago, with some additions.
One reason you, and your students, need to know these things is because they are the building blocks of first-line debunking. We use these facts, plus arithmetic, to ballpark the empirical claims we are exposed to all the time.
This followed my aggressive campaign to teach the undergraduate students in my class the size of the US population (I told you sociology isn’t an easy A). If you don’t know that — and some large portion of them didn’t — how can you interpret statements such as, “On average, 24 people per minute are victims of rape, physical violence, or stalking by an intimate partner in the United States.” In this case the source followed up with, “Over the course of a year, that equals more than 12 million women and men.” But, is that a lot? It’s a lot more in the United States than it would be in China. (Unless you go with, “any rape is too many,” in which case why use a number at all?)
Anyway, just the US population isn’t enough. I decided to start a list of current demographic facts you need to know just to get through the day without being grossly misled or misinformed — or, in the case of journalists or teachers or social scientists, not to allow your audience to be grossly misled or misinformed. Not trivia that makes a point or statistics that are shocking, but the non-sensational information you need to know to make sense of those things when other people use them. And it’s really a ballpark requirement (when I tested the undergraduates, I gave them credit if they were within 20% of the US population — that’s anywhere between 258 million and 387 million!).
I only got as far as 25 facts, but they should probably be somewhere in any top-100. And the silent reporters the other day made me realize I can’t let the perfect be the enemy of the good here. I’m open to suggestions for others (or other lists if they’re out there).
They are rounded to reasonable units for easy memorization. All refer to the US unless otherwise noted. Most of the links will take you to the latest data:
Brad Wilcox has written up his best case for how marriage protects women and girls from violence. I discussed his initial post earlier, but the blowup has prompted me to provide more general advice for the critical data citizen — reader, writer, and editor — who has to decide what to believe when someone comes at them with a data story.
I have some tips about that at the end, but first this elaborate setup.
The information in this section is true
Consider three stories:
When Melanie Thernstrom’s toddler, Kieran, first ate cheese, he immediately had a massive allergic attack. His face swelled, his skin turned red and scaly, and he started gasping for breath. They jumped in their car and rushed to the hospital, where doctors were able to save him.
Chicago mother Tynisha Hilliard had six children in the car when someone opened fire. “Mommy, I’m shot,” said her nine-year-old boy from the back seat. Hilliard immediately sped to the nearest hospital. “My reaction was to save my son. That’s all I can do, save my son,” she said. After emergency surgery for a gunshot wound to the chest, the boy was expected to survive.
When Dodgers catcher A. J. Ellis’s wife, Cindy, went into labor, they hopped in the car and headed for NYU hospital, normally a 35-minute drive. Despite racing through traffic with a police escort, they didn’t make it in time – the baby was born in the back seat – but they arrived at the hospital moments later, met by an emergency crew that whisked mother and child to care and safety in the hospital.
What do these stories have in common? Children’s lives saved by cars.
Is this part of a wider phenomenon? I know what you’re thinking: The pollution from cars hurts children, the vast resources devoted to infrastructure for cars could be spent instead in ways that help children, the need for gas causes wars all the time, and the individualism promoted by car culture contributes to social isolation instead of community efficacy.
Maybe. But let’s theorize a little. Here are three ways cars might be good for children’s health:
Kids whose families have cars can get them to doctors in an emergency. Considering that in modern societies a lot of what kills children is various kinds of accidents and medical emergencies, this could be a major advantage.
Say what you want about individualism, but it’s emerged as a modern character trait in tandem with the cultural shift that brought us the view of children as priceless individuals. Car culture is a major prop of individualism, so it’s reasonable to hypothesize that people who drive individual cars are more totally devoted to their priceless individual children’s well-being (rather than, say, the well-being of children in general).
Being able to transport oneself at will — any time, any place — may create a sense of self-efficacy, of mastery over one’s environment, which makes people refuse to accept failure (or illness or death), and thus devote themselves more confidently to their survival and the survival of their children.
Don’t take a theoretical word for it, though — let’s go to the data. Here are three small studies.
Cars and children’s health across countries
First we examine the relationship between the number of passenger cars per capita and the rate of child malnutrition in 110 countries (all the countries in the World Bank’s database that have measures of both variables in the last 10 years — mostly poor countries). The largest — India, China, Brazil, and the USA — are highlighted (click to enlarge).
This is a very strong relationship. This single variable, cars per capita, statistically explains no less than 67% of the variation in child malnutrition rates.
But, you liberals object, cars are surely more common in wealthier countries, so this relationship may be spurious. Sure, income and cars are positively correlated (r=.86, in fact). But when I fit a regression model with both per capita income and per capita cars, cars still have a highly significant statistical association with malnutrition (p<.001). (All the regression models are in the appendix at the end.)
Cars and child death rates across US states
Second, we take a closer look within the United States. Here there is a lot less variation in both the number of cars and the condition of children. Still, there is a clear relationship between private cars per person and the death rate of children and teenagers: Children are substantially less likely to die in states with more privately owned passenger cars (click to enlarge).
Again, there is less variation in income between U.S. states than there is between countries of the world. But to make sure this is not just a function of state income, I fit a regression model with cars and a control for median household income. The statistical effect of private cars remains significant at the p<.05 level, confirming it is unlikely to be due to chance.
Car commuting and children’s disabilities within the US
Third, let’s go still further, not just comparing US states but comparing children according to the car-driving habits of their parents within the US. For this I got data on children’s disabilities (four kinds of disability) and the means of transportation to work for their parents using the 2010-2012 American Community Survey, with a sample of more than 700,000 children ages 5-11.
Sure enough, children who live with parents who drive to work are substantially less likely to have disabilities than those who don’t live with a parent who drives to work:
Again, could this be because richer families are more likely to include car-driving parents? The regressions (below) show that, although it is true that children in richer households are less likely to have disabilities, the statistical effect of parents’ commuting method remains highly significant in the model that includes household income.
In summary: Children are less likely to be malnourished if they live in a country with more cars per person; they are less likely to die if they live in a state with more cars per person, and they are less likely to have disabilities if they live with parents who commute to work by car. All of these relationships are statistically significant with controls for income (of the country, state, or family). These are facts.
Compare this analysis to the question of marriage and violence. In their piece for the Washington Post (discussed here), Brad Wilcox and Robin Fretwell Wilson wrote about #YesAllWomen:
This social media outpouring makes it clear that some men pose a real threat to the physical and psychic welfare of women and girls. But obscured in the public conversation about the violence against women is the fact that some other men are more likely to protect women, directly and indirectly, from the threat of male violence: married biological fathers. The bottom line is this: Married women are notably safer than their unmarried peers, and girls raised in a home with their married father are markedly less likely to be abused or assaulted than children living without their own father.
With the facts above I can accurately offer this parallel construction:
Some cars pose a real threat to the health and safety of children. But obscured in the public conversation about auto safety, pollution, and environmental degradation is the fact that some other cars are more likely to protect children, directly and indirectly, from threats to their health and safety: cars driven by their own, responsible, caring parents. The bottom line is this: Children in places with more cars — and in families where parents commute by car — are notably healthier than peers without cars.
At the end of his followup post, Brad concludes:
Of course, none of these studies definitively prove that marriage plays a causal role in protecting women and children. But they are certainly suggestive. What we do know is this: Intact families with married parents are typically safer for women and children. … That’s why the conversation about violence against women and girls … should incorporate the family factor into efforts to reduce the violence facing women and girls.
I am equally confident in my conclusion:
Of course, my brief studies don’t definitively prove that cars plays a causal role in protecting children’s health and safety. But they are certainly suggestive. What we do know is this: Societies and families with cars are typically safer and healthier for children. That’s why the conversation about children’s well-being should incorporate the car factor into efforts to reduce the harms too many children continue to experience.
Both the marriage story and the car story are misleading data manipulations that substitute data volume for analytical power and present results in a way intended to pitch a conclusion rather than tell the truth.
When is a non-causal story “certainly suggestive”? When the person giving you the pitch wants you to believe the conclusion.
Please do not conclude from this that all data stories are equally corrupt, and everyone just picks the version that agrees with their preconception. Not all academics lie or distort their findings to fit their personal, political, or scientific conclusions. I may be more motivated to criticize Brad Wilcox because I disagree with his conclusions (and there may be people I agree with who use bad methods that I haven’t debunked), but that doesn’t mean I’m dishonest in my interpretation and presentation of evidence. Like a real climate scientist debunking climate-change deniers, I am happy that discrediting him is both morally good and scientifically correct (and I think that’s not a coincidence).
There are two main problems with both the cars story and the marriage story. First is selection into the independent variable condition (marriage and car ownership). People end up in these conditions partly because of their values on the dependent variable. For example, women in marriages are less likely to be raped on average because women don’t want to marry men who have raped them, or likely will rape them — the absence of rape causes marriage. In the case of children with disabilities, there is evidence that children’s disabilities increase the odds their parents will divorce (which means at least one of the parents isn’t in the household and so can’t be a car-commuting parent in the ACS data).
The other main problem is omitted variables. Other things cause both family violence and children’s health, and these are not adequately controlled even if researchers tell you they control for them. Controlling for household income (and other easily-measured demographics) does not capture all the benefits and privileges that married (or car-owning) people have and transfer to their children. For tricky questions of selection and omitted variables, we need to get closer to experimental conditions in order to provide causal explanations.
Tips for critical reading
So, based on Wilcox’s car story and my car story, here are practical tips to help you avoid getting hoodwinked by a propagandist with a PhD — or a data journalist looking at a mountain of data and a tight deadline. These are some things to watch out for:
Scatter plot proof
Impressive bivariate relationships; they may be presented with mention of control variables but no mention of adjusted effect size. That’s what I did with my scatter plots above. If you have adjusted results but don’t show them, it’s selling a small net effect with a big unadjusted label. (Wilcox examples here; Mark Regnerus does this, too.)
A classic example is the Obama food stamp meme, but Wilcox had a great example a few years ago when he wanted to show the drop in divorce that resulted from hard times pulling families together during the recession. If you assume divorce is always going up (it fell for decades), this looks like a dramatic change (he called it “the first annual dip since 2005”):
No head-to-head comparison of alternative explanations
This is a lot to ask, but real social scientists take seriously the alternative explanations for what they observe, and try to devise ways to test them against each other. Editors often see this as a low-hanging fruit for removal, because cutting it both shortens the piece and strengthens the argument. In the rape versus marriage story, Wilcox nodded to the alternative explanation that “women in healthy, safe relationships are more likely to select into marriage” — which he called “part of the story” — but he offered nothing to help a reader or editor adjudicate the relative size of that “part” of the story. This connects to the next red flag.
Greater than zero proof
Sometimes just showing that something exists at all is offered as evidence of its importance. That’s why I included three anecdotes about children being saved by private passenger cars — it happened, it’s real. The trick is to identify whether something matters in addition to existing. Here’s a Wilcox example where he showed that a tiny number of people said they didn’t divorce because of the recession; here’s an example in which Nate Cohn at the NYTimes Upshot said that 2% of Hispanics changing their race to White was “evidence consistent with the theory that Hispanics may assimilate as white Americans.” Neither of these provide any comparison to show how important these discoveries were relative to anything else — other reasons people delay divorce? other reasons for race-code changes? — they just exist. This is reasonable if you’re discovering a new subatomic particle, but with social behavior it’s less impressive.
Piles of studies
The reason I presented the car results as the three separate “studies” was to make the point that you can have a lot of studies, but if none of them prove your point it doesn’t matter. For example, in his post Wilcox linked to a series of publications about how children whose parents weren’t married were more likely to be sexually abused, but none of them handle the problem of selection into marriage I described above. Similarly, a generation of research showed that women who have babies as teenagers suffer negative economic consequences, but those effects were all exaggerated because people didn’t take selection into account (women with poor economic prospects are more likely to have babies as teenagers).
Describing one side of inequality as a social good
Let’s say that, in street fights, the person with a gun beats the person with a knife more than 50% of the time. Do we conclude people should have more guns? Some benefits are absolute and have no zero-sum quality to them. (I can’t think of any, but I assume there are some.) Normally, however, we’re talking about relative benefits. The benefits of marriage, or the economic benefits of education, are measured relative to people who aren’t married or schooled.
The typical description of such a pattern is, “This causes a good outcome, we should have more of it.” But we should always consider whether the best thing, socially, might be to reduce the benefit — that is, solve the problems of the people who don’t have the asset in question — rather than try to increase the number of people with the asset.
The benefit of cars that comes from being able to get to the hospital quicker may only be relative to the poor suckers stuck in an ambulance while your personal cars are blocking up Manhattan.
Wolfers’ culminating line, “Vive la révolution!”, suited Scott Winship, who looked over Wolfer’s figures before sniping, “the buzz around the book has come mostly from rich liberal states along the Boston-to-Washington corridor.” But I think they’re both misinterpreting.
According to the Google search data Wolfers used, these were the top 10 states for “piketty” searches (Washington, D.C. excluded): Massachusetts, New York, Connecticut, Maryland, New Jersey, Illinois, Pennsylvania, Wisconsin, Oregon, California.
It looks to me that it’s actually education driving the search data. And that is a big difference. Let me explain.
Microsoft Word tells me that the reading grade level of the publisher’s excerpt is 16.3, so it takes a 16th-grade education to read it. (Note that the “Boston-to-Washington corridor,” which was supposed to sound like a small sliver of the country, has 26% of the country’s college graduates.) So consider income versus college completion, which we can now take as a proxy for being able to read Piketty.
Wolfers writes, “I can’t tell you where Piketty has been least popular, because below a certain level of search activity, Google doesn’t release the actual numbers.” So he proceeds to leave 24 states out of his analysis (this will become important). Using per-capita income (converted to z-scores), and dropping 24 states plus the ridiculous outlier of DC, this is Wolfers’ income result (my calculations; he just showed scatter plots):
OK, leaving out the bottom half of the Piketty distribution, there is a strong positive relationship between per capita income and Piketty Google searches. Congratulations, you can have three jobs as an economist!
I kid Wolfers. But, come on! I don’t know what kind of data operation they’re running over there at the Upshot, but I would expect Wolfers to take it up a notch. First, control for college completion (percent of folks ages 25+ with a BA or more, also z-scored). See how it shows… oops:
The income effect is reduced but the education effect isn’t significant. (See how I showed you that instead of just going right to the results that support my argument?)
But go back to Wolfers leaving out the bottom half of the Piketty distribution. What’s wrong with that? I’m sure there’s some statistical way of explaining that, but just eyeballing it you’d have to say dropping those cases could cause trouble. The censored cases all have values of -.64 on the search variable. The relationship with income is weaker when the censored cases are included (shown in the red line) versus when he limits it to the top half of Piketty states (blue line):
What to do about this? An easy thing is just to include the censored cases at their values of -.64, just pretending -.64 is a legitimate value. That gives:
Now the income effect is reduced about three-quarters, and the college completion effect is three-times as large (with a t-stats to match).
But that’s not the best way to handle this. If only economists had invented a way of modeling data with censored dependent variables! Just kidding: there’s Tobin’s Tobit. This kind of model says, I see your censored dependent variable, and I crash it through the bottom of the distribution as a function of its linear relationship to your independent variables. So instead of all being -.64, it lets the censored cases be as low as they want to be, with values predicted by income and college completion. Sort of. Anyway, here’s that result:
Now income is crushed, reduced to literal insignificance. What matters is the percentage of the population that has completed college. It’s not that rich people like Piketty, it’s that college graduates do. Maybe because that’s who can read it. (I don’t know, I haven’t tried.)
What do economists read?
Of course, mine and Wolfers’ are both pretty crude analyses. There are only two reasons his was published on a major news site and mine was buried over here on an obscure sociology blog: (a) he writes for a major news site, and (b) his weak analysis lends itself to an emerging snarky narrative in which rich leftists are seen to whine about inequality but real people can’t be bothered (the main point of Winship’s review) — just reinforcing the echo-chamber model of knowledge consumption that people who are into “data-driven” news like to appear to have risen above.
For a real explanation, Wolfers (and Winship) need look no further than the rest of the Google Correlate results page to see the obvious fact that searches for Piketty are simply correlated with interest in economics. Here’s the search that is most highly correlated with searches for “piketty” across U.S. states: “world bank gdp” (r=.98):
Here are some other searches correlated with “piketty” at .94 or higher:
economic consulting firms
eu data protection
exchange rate data
gdp by sector
journal of labor economics
london school economics
nber working paper
panel data stata
stock market capitalization
the economist intelligence unit
us current account deficit
world bank statistics
Well, there goes your rich, liberal, “American left” theory of who’s driving the Piketty phenomenon. It might be true, but it’s not confirmed by the Google search data. My hot new theory: college educated people who are also interested in economics are disproportionately interested in Piketty.
* The reviewer pool: Mervyn King (The Telegraph), Paul Krugman (New York Review of Books), Tyler Cowen (Foreign Affairs), James K. Galbraith (Dissent), Daniel Schuchman (Wall Street Journal), Justin Fox (Harvard Business Review), Michael Tanner (National Review), John Cassidy (New Yorker), Martin Wolf (Financial Times), Jordan Weissmann (Slate), Steven Pearlstein (Washington Post), Scott Winship (National Review), Heather Boushey (Challenge)
A series of ridiculous data graphics posts from the NY Time Magazine, collected in one post (with crummy photo-pic renderings).
These are examples of the abuse of data graphic techniques to spread ignorance, distract people from anything of actual importance, and contribute to the perception that statistics – especially graphic statistics – are just an arbitrary way of manipulating people rather than a set of tools for exploring data and attempting to answer real questions. (If you are already convinced of this and just want to see awesome real graphics, I would start with Healy and Moody’s Annual Review of Sociology paper.)
First, an innocent graphic that merely wastes space and contributes nothing — it really communicates less than the 8 simple data points it has because the bats all over are just confusing and the points are in no order (who even notices that the number of segments each bat is cut into is the data point?):
Maybe a little better, I suppose, is this one, where the number of trees shown at least corresponds to the data points. But you would still learn more, faster, from a simple list:
Here is an interesting mistake. I first thought these bars were out of order, but it turns out it’s just the top part of the bars that are out of order. If they were flat-topped bars it would be okay:
Here’s one that combines useless graphics with data that is itself completely misleading. These are the fees associated with different parks in NY City. But the units of time are different. What is the point of comparing the annual tennis fee to the hourly roller hockey fee? At least they didn’t make the cards different sizes to show this meaningless comparison more clearly.
The magazine also does text “analytics.” These are on the letters page, and they show the type of letters received. This is interesting to sociologists, who sometimes try to find ways to categorize text. They make two errors here that render these meaningless or worse.
First, they sometimes present them in order – as represented by graphic elements – when the sentiments expressed are not in that logical order. Like this one, in which the dial and shading implies these are in some logical order, but they aren’t:
They also did that here, with the shading implying some continuum that is not present. (In this one, also, is it the proportion of the state’s area the determines the size of the cuts, or the angle of the cuts at the center?). Come on!
A final point holds for all these letter “analytics.” You really shouldn’t determine the number of categories you are going to use before you read the texts, “Here, go break these letters into four categories.” For the love of God, they don’t even have an “other” category, and always ways add to 100%.
Now with new figures and notes added at the end — and a new, real life headline and graph illustrating the problem in the middle!
The short answer is, pretty well. But that’s not really the point.
In a previous post I complained about various ways of collapsing data before plotting it. Although this is useful at times, and inevitable to varying degrees, the main danger is the risk of inflating how strong an effect seems. So that’s the point about teen test scores and adult income.
If someone told you that the test scores people get in their late teens were highly correlated with their incomes later in life, you probably wouldn’t be surprised. If I said the correlation was .35, on a scale of 0 to 1, that would seem like a strong relationship. And it is. That’s what I got using the National Longitudinal Survey of Youth. I compared the Armed Forces Qualifying Test scores, taken in 1999, when the respondents were ages 15-19 with their household income in 2011, when they were 27-31.*
Here is the linear fit between between these two measures, with the 95% confidence interval shaded, showing just how confident we can be in this incredibly strong relationship:
That’s definitely enough for a screaming headline, “How your kids’ test scores tell you whether they will be rich or poor.”
In fact, since I originally wrote this, the Washington Post Wonkblog published a post with the headline, “Here’s how much your high school grades predict your future salary,” with this incredibly tidy graph:
No doubt these are strong relationships. My correlation of .35 means AFQT explains 12% of the variation in household income. But take heart, ye parents in the age of uncertainty: 12% of the variation leaves a lot left over. This variable can’t account for how creative your children are, how sociable, how attractive, how driven, how entitled, how connected, or how White they may be. To get a sense of all the other things that matter, here is the same data, with the same regression line, but now with all 5,248 individual points plotted as well (which means we have to rescale the y-axis):
Each dot is a person’s life — or two aspects of it, anyway — with the virtually infinite sources of variability that make up the wonder of social existence. All of a sudden that strong relationship doesn’t feel like something you can bank on with any given individual. Yes, there are very few people from the bottom of the test-score distribution who are now in the richest households (those clipped by the survey’s topcode and pegged at 3 on my scale), and hardly anyone from the top of the test-score distribution who is now completely broke.
But I would guess that for most kids a better predictor of future income would be spending an hour interviewing their parents and high school teachers, or spending a day getting to know them as a teenager. But that’s just a guess (and that’s an inefficient way to capture large-scale patterns).
I’m not here to argue about how much various measures matter for future income, or whether there is such a thing as general intelligence, or how heritable it is (my opinion is that a test such as this, at this age, measures what people have learned much more than a disposition toward learning inherent at birth). I just want to give a visual example of how even a very strong relationship in social science usually represents a very messy reality.
1. Prediction intervals
I probably first wrote about this difference between the slope and the variation around the slope two years ago, in a futile argument against the use of second-person headlines such as “Homophobic? Maybe You’re Gay.” Those headlines always try to turn research into personal advice, and are almost always wrong.
Carter Butts, in personal correspondence, offered an explanation that helps make this clear. The “you” type headline presents a situation in which you — the reader — are offered the chance to add yourself to the study. In that case, your outcome (the “new response” in his note) is determined by the both the line and the variation around the line. Carter writes:
the prediction interval for a new response has to take into account not only the (predicted) expectation, but also the (predicted) variation around that expectation. A typical example is attached; I generated simulated data (N=1000) via the indicated formula, and then just regressed y on x. As you’d expect, the confidence bands (red) are quite narrow, but the prediction bands (green) are large – in the true model, they would have a total width of approximately 1, and the estimated model is quite close to that. Your post nicely illustrated that the precision with which we can estimate a mean effect is not equivalent to the variation accounted for by that mean effect; a complementary observation is that the precision with which we can estimate a mean effect is not equivalent to the accuracy with which we can predict a new observation. Nothing deep about that … just the practical points that (1) when people are looking at an interval, they need to be wary of whether it is a confidence interval or a prediction interval; and (2) prediction interval can (and often should be) wide, even if the model is “good” in the sense of being well-estimated.
And here is his figure. “You” are very likely to be between the green lines, but not so likely to be between the red ones.
2. Random other variables
I didn’t get into the substantive issues, which are outside my expertise. However, one suggestion I got was interesting: What about happiness? Without endorsing the concept of “life satisfaction” as measured by a single question, I still think this is a nice addition because it underscores the point of wide variation in how this relationship between test scores and income might be experienced.
So here is the same figure, but with the individuals coded according to how they answered the following question in 2008, when they were age 24-28, “All things considered, how satisfied are you with your life as a whole these days? Please give me an answer from 1 to 10, where 1 means extremely dissatisfied and 10 means extremely satisfied.” In the figure, Blue is least satisfied (1-6; 21%), Orange is moderately satisfied (7-8; 46%), and Green is most satisfied (9-10; 32%)
Even if you squint you probably can’t discern the pattern. Life satisfaction is positively correlated with income at .16, and less so with test scores (.07). Again, significant correlation — not helpful for planning your life.
* I actually used something similar to AFQT: the variable ASVAB, which combines tests of mathematical knowledge, arithmetic reasoning, word knowledge, and paragraph comprehension, and scales them from 0 to 100. For household income, I used a measure of household income relative to the poverty line (adjusted for household size), plus one, and transformed by natural log. I used household income because some good test-takers might marry someone with a high income, or have fewer people in their households — good decisions if your goal is maximizing household income per person.
How we describe the directionality of an effect affects how we think about it. Andrew Gelman complains that the recent paper by Dalton Conley and Emily Rauscher does this. It’s called, “The Effect of Daughters on Partisanship and Social Attitudes Toward Women.” And the news headlines were things like, “Does Having Daughters Make You More Republican?” Ross Douthat called it “The Daughter Theory.”
But of course the finding could just as well be described as the effect of sons on making people more liberal.
In this case it’s a great example of boys being the norm and girls being difference. But there are plenty of examples of when we describe an effect as if its opposite doesn’t exist. Here are three:
The marriage premium. This usually refers to married men earning more than single men. But it is just as much a penalty for being single as it is a reward for being married. In my own work on this I described three possible mechanisms: positive selection into marriage (higher earners marry), productivity-enhancing effects of marriage (wives make men better workers), and discrimination (bosses prefer married men). But all of these could have been expressed in the reverse direction. Lots of “marriage is good” arguments should be turned around to ask, “How could we punish single people less”?
The gender gap. President Obama has frequently implied that reducing the gender gap in pay will be good for “middle class families.” Under “Protecting the Middle Class News,” the White House writes that the gender gap “means less for families’ everyday needs, less for investments in our children’s futures, and, when added up over a lifetime of work, substantially less for retirement.” Of course, it also means more for families with employed men. I hate to be a buzzkill on this, but there is no reason to think that reducing gender discrimination just means paying women more. How do we know women are underpaid, instead of men being overpaid?
Returns to education. This one is tricky, because there is a return on investment from education, so it’s reasonable to talk about the effect in that direction: you spend money on education, you get a benefit. But the society that rewards education also penalizes lack of education relatively speaking (unless everyone is equally educated). Nothing against educated people, but to reduce inequality it would be good to reduce the returns to education. For example, raising the minimum wage, or providing government jobs to low-skilled workers, would reduce returns to education (if that is operationalized as the difference between college and non-college wages.
I never read Edward Tufte‘s book The Visual Display of Quantitative Information before. (I have a lot of practice but almost no training in visual presentation of data.)
How do you describe the change in one variable between two points in time? Here’s an example of a “slopegraph” of the kind Tufte likes (many examples here). He takes a list of 15 countries’ government receipts as percentage of GDP for 1970 and 1979, and produces this simple graph:
He likes it because all the ink is data (he’s inexplicably invested in the conservation of ink). And he likes how it’s easy to see the change for each country, as well as the two ranked lists for each time point, and those with unusual changes, such as Britain, the only country with a decline. Those are strengths, and this kind of graph is often great. An alternative is a change scatter plot. Here it is with the same data:
In this you can see the overall upward movement (points over the red line), and specifics such as the three countries that moved as a group from 40-50 percent range to the 50-60 percent range. It also allows a vertical reading, to make comparisons between countries that started the 1970s similarly, such as Switzerland and Greece, Italy and the US, Belgium and Canada — to see how they diverged, with Switzerland, Italy, and Belgium all moving up more during the decade.
I think the scatter plot approach is especially helpful when you want to see how the change differs at different points in a distribution, or when there are lots of data points.
In a figure from this paper on gender segregation among managers we used it to show how the pace of women’s advance into managerial occupations stalled in the 1990s, by overlaying changes from two time periods on the same figure:
The fact that these lines are essentially parallel is useful and clearly shown. You could make this graph as a slopegraph with three columns, showing two changes, but I don’t think you’d see the pattern as well.
Here’s one I made for something else but haven’t used yet, showing the decline of manufacturing in 50 large metro areas over three decades. In this one they’re all compared with 1980, creating vertical columns of white, gray and black dots over each MA’s 1980 starting point.
Tufte would call all that white space above the diagonal a big waste.
In the Tufte example above there aren’t many cases so you could label them all. In my marriage example you can figure out the countries based on short abbreviations because the names are familiar. And in the managerial occupations or metro areas it’s the shape of the cloud that matters, so it’s OK not to label them.
Here is an example with a lot of cases, each of which is labeled, from an op-ed by Stephanie Coontz in the New York Times, showing the change in the gender composition of occupations from 1980 to 2010. This one adds a categorical scheme that is supposed to make the types of changes more easily discernible. So those in the top gray box are female-dominated, those in the bottom gray box are male-dominated, and those in the middle are integrated. Green lines denote occupations that entered the integrated zone; red lines denote occupations that became more segregated.
This has a lot of information, but it doesn’t do much more for me than a table would. And the categorical color scheme hides a number of occupations that changed a lot but remained within the arbitrary categories (gray lines). By converting it to a change scatter plot, you can get a sense of the overall pattern of change, and still isolate those with big changes. In the version here I’ve only tagged the ones that changed 20 percentage points or more, so a lot of information is lost, but the graph is a lot smaller, so you could afford to add some text with additional detail.
Here you quickly see that most occupations became more female. And there is a clump of occupations that changed a lot but remained in the middle-range category — medical, education, and human resource managers, and accountants. These were grayed out in the Times version, but they integrated dramatically so you should notice them.
This might not be the best example, but I like this method of showing within-case changes over time.
After a literature review that is a model of selective and skewed reading of previous research (worth reading just for that), they use state marriage promotion funding levels* in a year- and state-fixed effects model to predict the percentage of the population that is married, divorced, children living with two parents, one parent, nonmarital births, poverty and near-poverty, each in separate models with no control variables, for the years 2000-2010 using the American Community Survey.
To find beneficial effects — no easy task, apparently — they first arbitrarily divided the years into two periods. Here is the rationale for that:
We hypothesized that any HMI [Healthy Marriage Initiative] effects were weaker (or nonexistent) early in the decade (when funding levels were uniformly low) and stronger in the second half of the decade (when funding levels were at their peak).
This doesn’t make sense to me. If funding levels were low and there was no effect in the early period, and then funding levels rose and effects emerged in the later period, then the model for all years should show that funding had an effect. Correct me if I’m wrong, but I don’t think this passes the smell test.
Then they report their beneficial effects, which are significant if you allow them p<.10 as a cutoff, which is kosher under house rules because they had directional hypotheses.
However, then they admit their effects are only significant because they included Washington, DC. That city had per capita funding levels about 9-times the mean (“about $22” versus “about $2.50”), and had an improving family well-being profile during the period (how much of an outlier DC is on the dependent variables they didn’t discuss, and I don’t have time to show it now, but I reckon it’s pretty extreme, too). To deal with this extreme outlier, they first cut the independent variable in half for DC, bringing it down to about 4.4-times the mean and a third higher then the next most-extreme state, Oklahoma (itself pretty extreme). That change alone cut the number of significant effects down from six to three.
Then, in the tragic coup de grâce of their own paper, they remove DC from the analysis, and nothing is left. They don’t quite see it that way, however:
But with the District of Columbia excluded from the data (right panel of Table 3), all of the results were reduced to nonsignificance. Once again, most of the regression coefficients in this final analysis were comparable to those in Table 2 (right panel) in direction and magnitude, but they were rendered nonsignificant by a further increase in the size of the standard errors.
Really. What is “comparable in direction and magnitude” mean, exactly? I give you (for free!) the two tables. First, the full model:
Then, the models with DC rescaled or removed (they’re talking about the comparison between the right-hand panel in both tables):
Some of the coefficients actually grew in the direction they want with DC gone. But two moved drastically away from the direction of their preferred outcome: the two-parent coefficient is 44% smaller, the poor/near-poor coefficient fell 78%.
Some outlier! As they helpfully explain, “The lack of significance can be explained by the larger standard errors.” In the first adjustment, rescaling DC, all the standard errors at least doubled. And all of the standard errors are at least three-times larger with DC gone. I’m not a medical doctor, but I think it’s fair to say that when removing one case triples your standard errors, your regression model is not feeling well.
One other comment on DC. Any outlier that extreme is a serious problem for regression analysis, obviously. But there is a substantive issue here as well. They feebly attempt to turn the DC results in their favor, by talking about is unique conditions. But what they don’t do is consider the implications of DC’s unique change over this time for their analysis. And that’s what matters in a year- and state-fixed effects model. How did DC change independently of marriage promotion funds? Most importantly, 8% of the population during 2006-2010 was new to town each year. That’s four-times the national average of in-migration in that period. This churning is of course a problem for their analysis, which is trying to measure cumulative effects of program spending in that place — hard to do when so many people moved there after the spending occurred. But it’s also not random churning: the DC population went from 57% Black to 52% Black in just five years. DC is changing, and it’s not because of marriage promotion programs.
Finally, their own attempt at a self-serving conclusion is the most damning:
Despite the limitations, the current study is the most extensive and rigorous investigation to date of the implications of government-supported HMIs for family change at the population level.
Ouch. Oh well. Anyway, please keep giving the programs money, and us money for studying them**:
In sum, the evidence from a variety of studies with different approaches targeting different populations suggests a potential for positive demographic change resulting from funding of [Marriage and Relationship Education] programs, but considerable uncertainty still remains. Given this uncertainty, more research is needed to determine whether these programs are accomplishing their goals and worthy of continued support.
*The link to their data source is broken. They say they got other data by calling around.
**The lead author, Alan Hawkins, has received about $120,000 in funding from various marriage promotion sources.
Brad Wilcox wrote a blog post for the Atlantic the other day, in which he described the well-known pattern by which children of married parents on average grow up richer and more highly educated than those raised by single parents. (Follow Wilcox’s lies, errors, and shenanigans under this tag.)
It’s old news, but before I make today’s point, here are a few reasons this kind of thing is wrong and useless.
1. Although the headline says, “Marriage Makes Our Children Richer,” the data Wilcox shows does not approach a causal model. Comparing children who lived with married parents as adolescents to those who did not when they are young adults, he uses controls for mother’s education, race/ethnicity, and household income. Those married parents differ from the single parents in many more ways than that, and did before they got married. Wilcox and actual researchers know this. The Atlantic business editor apparently doesn’t.
2. Even to the extent that marriage helps married people, which it does, on average, this does not imply that mothers who are currently not getting married would get those benefits if they did get married. Because, who are they going to marry? If rich-prince-charming were there most of them would have married him already. So to consider the effects of them marrying you have to take into account that it’s not the right guy or the right relationship at the right time. So, good luck.
3. Finally, so, you gonna promote marriage? We’ve seen how that works. On the other hand, we know we can mitigate a lot of the harm from difficult childhoods by throwing jobs and money at their food, healthcare, and education needs. If you care about poverty and inequality more than marriage, that’s the way to go.
Anyway, my complaint today is about a particular kind of deception that Wilcox likes do engage in, which Mark Regnerus also did in his infamous paper. The trick is to display unadjusted figures, but describe them as if they include statistical controls. First, how Wilcox did it this time, then an simple example of how wrong it is.
Wilcox shows this figure, among others:
This is supposedly how marriage makes children richer, because most of the blue bars (“intact family”) are taller than the red bars (“non-intact family”). Set aside what should be the obvious conclusion: having a mother who went to college matters much more than whether your parents were married (which we also already knew). I want to focus on the little symbols *^, which indicate a statistically significant difference with the different controls he used. This is his footnote:
An asterisk (*) indicates a statistically-significant difference (p < 0.05) between respondents who lived with both, married biological parents at Wave I compared with respondents from other family structures, controlling for respondent’s age and race/ethnicity. A hat (^) indicates that there was still a statistically-significant difference when Wave I household income was added as an additional control.
But the numbers shown in the figure are not adjusted for those controls. Presumably, the family structure differences would be smaller with the controls — and they’re already pretty small.
We don’t have his underlying numbers (and I wouldn’t expect to see them in a peer-reviewed journal anytime soon; Regnerus never reported his). So I made a simple example to show how misleading this is. I took the employed 25-55 year-old non-Hispanic White and Black men from the 2011 American Community Survey (excluding the richest 5%) and compared their earnings with and without controls for education, age, hours and weeks worked in the previous year, and marital status. The question is, how much more do White men earn? These are the simple regression results:
In the first model, the intercept is the mean earnings for White men, and the Black coefficient is the difference between the White and Black means. This is the unadjusted difference — $13,551 — which is the equivalent of what Wilcox plotted in the graph. But with the controls the difference is reduced to $5,498 — a big difference. The difference is illuminating because it shows how much of the overall gap is accounted for by the distribution of the control variables for Black versus White men.
If Wilcox did this exercise, however, he would produce a graph like this:
See how he did that? He’s selling a $5,498 difference with a $13,551 label. He did the same dishonest thing in his “Knot Yet” report, with Kay Hymowitz and others.
When your audience is ideological foundation bigwigs and credulous (at best) editors, these asterisks and footnotes just make you look smart. These people are apparently impervious to honest reasoning. For the rest of us, at least, it can be a lesson in how to not to do research.
ADDENDUM: How should you do it?
Conrad Hacket below asks what I suggest as a better way to represent the data. Sometimes the unadjusted difference is important even if it is statistically accounted for by some control variable. In the case of race differences in earnings, for example, the fact that there is a $13k+ gap is itself socially important. However, if you are going to make some argument about its importance net of the controls, this is how I would do it, given this very simple linear model, with no interactions or any fancy stuff (note I used non-transformed earnings and censored the top 5% — those at $150k+ — so that the coefficients would be easily interpretable in dollars without being too skewed by the richy-rich).
Using the regression coefficients and the grand means, you sum the products of the means and coefficients for each group, like this:
And then graph the results with a label like this:
Another reasonable strategy instead of using the grand means is to use a common scenario for the calculation, such as a married high school graduate, age 35, who works full-time year-round. Or various other methods of obtaining predicted values.