Deciphering a well-told data story, cars are good for kids edition

Brad Wilcox has written up his best case for how marriage protects women and girls from violence. I discussed his initial post earlier, but the blowup has prompted me to provide more general advice for the critical data citizen — reader, writer, and editor — who has to decide what to believe when someone comes at them with a data story.

I have some tips about that at the end, but first this elaborate setup.

The information in this section is true

Consider three stories:

  • When Melanie Thernstrom’s toddler, Kieran, first ate cheese, he immediately had a massive allergic attack. His face swelled, his skin turned red and scaly, and he started gasping for breath. They jumped in their car and rushed to the hospital, where doctors were able to save him.
  • Chicago mother Tynisha Hilliard had six children in the car when someone opened fire. “Mommy, I’m shot,” said her nine-year-old boy from the back seat. Hilliard immediately sped to the nearest hospital. “My reaction was to save my son. That’s all I can do, save my son,” she said. After emergency surgery for a gunshot wound to the chest, the boy was expected to survive.
  • When Dodgers catcher A. J. Ellis’s wife, Cindy, went into labor, they hopped in the car and headed for NYU hospital, normally a 35-minute drive. Despite racing through traffic with a police escort, they didn’t make it in time – the baby was born in the back seat – but they arrived at the hospital moments later, met by an emergency crew that whisked mother and child to care and safety in the hospital.

What do these stories have in common? Children’s lives saved by cars.

Is this part of a wider phenomenon? I know what you’re thinking: The pollution from cars hurts children, the vast resources devoted to infrastructure for cars could be spent instead in ways that help children, the need for gas causes wars all the time, and the individualism promoted by car culture contributes to social isolation instead of community efficacy.

Maybe. But let’s theorize a little. Here are three ways cars might be good for children’s health:

  • Kids whose families have cars can get them to doctors in an emergency. Considering that in modern societies a lot of what kills children is various kinds of accidents and medical emergencies, this could be a major advantage.
  • Say what you want about individualism, but it’s emerged as a modern character trait in tandem with the cultural shift that brought us the view of children as priceless individuals. Car culture is a major prop of individualism, so it’s reasonable to hypothesize that people who drive individual cars are more totally devoted to their priceless individual children’s well-being (rather than, say, the well-being of children in general).
  • Being able to transport oneself at will — any time, any place — may create a sense of self-efficacy, of mastery over one’s environment, which makes people refuse to accept failure (or illness or death), and thus devote themselves more confidently to their survival and the survival of their children.

Don’t take a theoretical word for it, though — let’s go to the data. Here are three small studies.

Cars and children’s health across countries

First we examine the relationship between the number of passenger cars per capita and the rate of child malnutrition in 110 countries (all the countries in the World Bank’s database that have measures of both variables in the last 10 years — mostly poor countries). The largest — India, China, Brazil, and the USA — are highlighted (click to enlarge).

cars-malnourishment

This is a very strong relationship. This single variable, cars per capita, statistically explains no less than 67% of the variation in child malnutrition rates.

But, you liberals object, cars are surely more common in wealthier countries, so this relationship may be spurious. Sure, income and cars are positively correlated (r=.86, in fact). But when I fit a regression model with both per capita income and per capita cars, cars still have a highly significant statistical association with malnutrition (p<.001). (All the regression models are in the appendix at the end.)

Cars and child death rates across US states

Second, we take a closer look within the United States.  Here there is a lot less variation in both the number of cars and the condition of children. Still, there is a clear relationship between private cars per person and the death rate of children and teenagers: Children are substantially less likely to die in states with more privately owned passenger cars (click to enlarge).

cars-deaths-states

Again, there is less variation in income between U.S. states than there is between countries of the world. But to make sure this is not just a function of state income, I fit a regression model with cars and a control for median household income. The statistical effect of private cars remains significant at the p<.05 level, confirming it is unlikely to be due to chance.

Car commuting and children’s disabilities within the US

Third, let’s go still further, not just comparing US states but comparing children according to the car-driving habits of their parents within the US. For this I got data on children’s disabilities (four kinds of disability) and the means of transportation to work for their parents using the 2010-2012 American Community Survey, with a sample of more than 700,000 children ages 5-11.

Sure enough, children who live with parents who drive to work are substantially less likely to have disabilities than those who don’t live with a parent who drives to work:

disab-bars

Again, could this be because richer families are more likely to include car-driving parents? The regressions (below) show that, although it is true that children in richer households are less likely to have disabilities, the statistical effect of parents’ commuting method remains highly significant in the model that includes household income.

In summary: Children are less likely to be malnourished if they live in a country with more cars per person; they are less likely to die if they live in a state with more cars per person, and they are less likely to have disabilities if they live with parents who commute to work by car. All of these relationships are statistically significant with controls for income (of the country, state, or family). These are facts.

One interpretation

Compare this analysis to the question of marriage and violence. In their piece for the Washington Post (discussed here), Brad Wilcox and Robin Fretwell Wilson wrote about #YesAllWomen:

This social media outpouring makes it clear that some men pose a real threat to the physical and psychic welfare of women and girls. But obscured in the public conversation about the violence against women is the fact that some other men are more likely to protect women, directly and indirectly, from the threat of male violence: married biological fathers. The bottom line is this: Married women are notably safer than their unmarried peers, and girls raised in a home with their married father are markedly less likely to be abused or assaulted than children living without their own father.

With the facts above I can accurately offer this parallel construction:

Some cars pose a real threat to the health and safety of children. But obscured in the public conversation about auto safety, pollution, and environmental degradation is the fact that some other cars are more likely to protect children, directly and indirectly, from threats to their health and safety: cars driven by their own, responsible, caring parents. The bottom line is this: Children in places with more cars — and in families where parents commute by car — are notably healthier than peers without cars.

At the end of his followup post, Brad concludes:

Of course, none of these studies definitively prove that marriage plays a causal role in protecting women and children. But they are certainly suggestive. What we do know is this: Intact families with married parents are typically safer for women and children. … That’s why the conversation about violence against women and girls … should incorporate the family factor into efforts to reduce the violence facing women and girls.

I am equally confident in my conclusion:

Of course, my brief studies don’t definitively prove that cars plays a causal role in protecting children’s health and safety. But they are certainly suggestive. What we do know is this: Societies and families with cars are typically safer and healthier for children. That’s why the conversation about children’s well-being should incorporate the car factor into efforts to reduce the harms too many children continue to experience.

Another interpretation

Both the marriage story and the car story are misleading data manipulations that substitute data volume for analytical power and present results in a way intended to pitch a conclusion rather than tell the truth.

When is a non-causal story “certainly suggestive”? When the person giving you the pitch wants you to believe the conclusion.

Please do not conclude from this that all data stories are equally corrupt, and everyone just picks the version that agrees with their preconception. Not all academics lie or distort their findings to fit their personal, political, or scientific conclusions. I may be more motivated to criticize Brad Wilcox because I disagree with his conclusions (and there may be people I agree with who use bad methods that I haven’t debunked), but that doesn’t mean I’m dishonest in my interpretation and presentation of evidence. Like a real climate scientist debunking climate-change deniers, I am happy that discrediting him is both morally good and scientifically correct (and I think that’s not a coincidence).

There are two main problems with both the cars story and the marriage story. First is selection into the independent variable condition (marriage and car ownership). People end up in these conditions partly because of their values on the dependent variable. For example, women in marriages are less likely to be raped on average because women don’t want to marry men who have raped them, or likely will rape them — the absence of rape causes marriage. In the case of children with disabilities, there is evidence that children’s disabilities increase the odds their parents will divorce (which means at least one of the parents isn’t in the household and so can’t be a car-commuting parent in the ACS data).

The other main problem is omitted variables. Other things cause both family violence and children’s health, and these are not adequately controlled even if researchers tell you they control for them. Controlling for household income (and other easily-measured demographics) does not capture all the benefits and privileges that married (or car-owning) people have and transfer to their children. For tricky questions of selection and omitted variables, we need to get closer to experimental conditions in order to provide causal explanations.

Tips for critical reading

So, based on Wilcox’s car story and my car story, here are practical tips to help you avoid getting hoodwinked by a propagandist with a PhD — or a data journalist looking at a mountain of data and a tight deadline. These are some things to watch out for:

Scatter plot proof

Impressive bivariate relationships; they may be presented with mention of control variables but no mention of adjusted effect size. That’s what I did with my scatter plots above. If you have adjusted results but don’t show them, it’s selling a small net effect with a big unadjusted label. (Wilcox examples here; Mark Regnerus does this, too.)

Axis truncation

A classic example is the Obama food stamp meme, but Wilcox had a great example a few years ago when he wanted to show the drop in divorce that resulted from hard times pulling families together during the recession. If you assume divorce is always going up (it fell for decades), this looks like a dramatic change (he called it “the first annual dip since 2005”):

No head-to-head comparison of alternative explanations

This is a lot to ask, but real social scientists take seriously the alternative explanations for what they observe, and try to devise ways to test them against each other. Editors often see this as a low-hanging fruit for removal, because cutting it both shortens the piece and strengthens the argument. In the rape versus marriage story, Wilcox nodded to the alternative explanation that “women in healthy, safe relationships are more likely to select into marriage” — which he called “part of the story” — but he offered nothing to help a reader or editor adjudicate the relative size of that “part” of the story. This connects to the next red flag.

Greater than zero proof

Sometimes just showing that something exists at all is offered as evidence of its importance. That’s why I included three anecdotes about children being saved by private passenger cars — it happened, it’s real. The trick is to identify whether something matters in addition to existing. Here’s a Wilcox example where he showed that a tiny number of people said they didn’t divorce because of the recession; here’s an example in which Nate Cohn at the NYTimes Upshot said that 2% of Hispanics changing their race to White was “evidence consistent with the theory that Hispanics may assimilate as white Americans.” Neither of these provide any comparison to show how important these discoveries were relative to anything else — other reasons people delay divorce? other reasons for race-code changes? — they just exist. This is reasonable if you’re discovering a new subatomic particle, but with social behavior it’s less impressive.

Piles of studies

The reason I presented the car results as the three separate “studies” was to make the point that you can have a lot of studies, but if none of them prove your point it doesn’t matter. For example, in his post Wilcox linked to a series of publications about how children whose parents weren’t married were more likely to be sexually abused, but none of them handle the problem of selection into marriage I described above. Similarly, a generation of research showed that women who have babies as teenagers suffer negative economic consequences, but those effects were all exaggerated because people didn’t take selection into account (women with poor economic prospects are more likely to have babies as teenagers).

Describing one side of inequality as a social good

Let’s say that, in street fights, the person with a gun beats the person with a knife more than 50% of the time. Do we conclude people should have more guns? Some benefits are absolute and have no zero-sum quality to them. (I can’t think of any, but I assume there are some.) Normally, however, we’re talking about relative benefits. The benefits of marriage, or the economic benefits of education, are measured relative to people who aren’t married or schooled.

The typical description of such a pattern is, “This causes a good outcome, we should have more of it.” But we should always consider whether the best thing, socially, might be to reduce the benefit — that is, solve the problems of the people who don’t have the asset in question — rather than try to increase the number of people with the asset.

The benefit of cars that comes from being able to get to the hospital quicker may only be relative to the poor suckers stuck in an ambulance while your personal cars are blocking up Manhattan.

Ambulance stuck in Manhattan, by Philip Cohen
Ambulance stuck in Manhattan, by Philip Cohen

Appendix: Regression results

regs

Check that: Most marrying people are remarrying above age 31

The other day I wrote that the majority of people marrying over age 35 have been married before. That is true, but because of the way I handled the age categories it’s not specific enough. In fact, the majority of men marrying over age 30, and the majority of women marrying over age 28, have been married before.

Here are the details, in two charts, both using marital events data from the 2012 American Community Survey from IPUMS.org. The first shows the breakdown between first-married and previously-married people marrying at each age. It is not until age 40 for men, and age 38 for women, that previously-married people become the majority marrying at each age. These proportions reach two thirds in the mid-40s and surpass 80% by age 52:

timesmarriedmarrying-area

But the percent remarrying at or above a given age is higher. Here is that pattern, showing that we enter majority-remarried territory at 31 for men and 29 for women:

timesmarriedmarrying-lines

The rates of remarriage at a given age maybe matter more practically, but this is a neat way to look at it.

Note there is no demographic reason that these patterns must hold. If remarriage were taboo or more restricted this would not be the case. Being ever married cannot be revoked (unless people lie to the Census Bureau), so the percent ever-married should never decline for a cohort (unless the ever-married have much higher mortality or emigrate more than the never-married, which is very unlikely). But ever-married proportions for the population don’t have to rise with age in a given cross-section, even if you don’t just look at people marrying right now. If marriage were becoming more common on a cohort basis, for example (which it is not), you could see higher ever-married rates among young people than among old people.

How well do teen test scores predict adult income?

Now with new figures and notes added at the end — and a new, real life headline and graph illustrating the problem in the middle!

The short answer is, pretty well. But that’s not really the point.

In a previous post I complained about various ways of collapsing data before plotting it. Although this is useful at times, and inevitable to varying degrees, the main danger is the risk of inflating how strong an effect seems. So that’s the point about teen test scores and adult income.

If someone told you that the test scores people get in their late teens were highly correlated with their incomes later in life, you probably wouldn’t be surprised. If I said the correlation was .35, on a scale of 0 to 1, that would seem like a strong relationship. And it is. That’s what I got using the National Longitudinal Survey of Youth. I compared the Armed Forces Qualifying Test scores, taken in 1999, when the respondents were ages 15-19 with their household income in 2011, when they were 27-31.*

Here is the linear fit between between these two measures, with the 95% confidence interval shaded, showing just how confident we can be in this incredibly strong relationship:

afqt-linear

That’s definitely enough for a screaming headline, “How your kids’ test scores tell you whether they will be rich or poor.”

In fact, since I originally wrote this, the Washington Post Wonkblog published a post with the headline, “Here’s how much your high school grades predict your future salary,” with this incredibly tidy graph:

earnings-gpa

No doubt these are strong relationships. My correlation of .35 means AFQT explains 12% of the variation in household income. But take heart, ye parents in the age of uncertainty: 12% of the variation leaves a lot left over. This variable can’t account for how creative your children are, how sociable, how attractive, how driven, how entitled, how connected, or how White they may be. To get a sense of all the other things that matter, here is the same data, with the same regression line, but now with all 5,248 individual points plotted as well (which means we have to rescale the y-axis):

afqt-scatter

Each dot is a person’s life — or two aspects of it, anyway — with the virtually infinite sources of variability that make up the wonder of social existence. All of a sudden that strong relationship doesn’t feel like something you can bank on with any given individual. Yes, there are very few people from the bottom of the test-score distribution who are now in the richest households (those clipped by the survey’s topcode and pegged at 3 on my scale), and hardly anyone from the top of the test-score distribution who is now completely broke.

But I would guess that for most kids a better predictor of future income would be spending an hour interviewing their parents and high school teachers, or spending a day getting to know them as a teenager. But that’s just a guess (and that’s an inefficient way to capture large-scale patterns).

I’m not here to argue about how much various measures matter for future income, or whether there is such a thing as general intelligence, or how heritable it is (my opinion is that a test such as this, at this age, measures what people have learned much more than a disposition toward learning inherent at birth). I just want to give a visual example of how even a very strong relationship in social science usually represents a very messy reality.

Post-publication addendums

1. Prediction intervals

I probably first wrote about this difference between the slope and the variation around the slope two years ago, in a futile argument against the use of second-person headlines such as “Homophobic? Maybe You’re Gay.” Those headlines always try to turn research into personal advice, and are almost always wrong.

Carter Butts, in personal correspondence, offered an explanation that helps make this clear. The “you” type headline presents a situation in which you — the reader — are offered the chance to add yourself to the study. In that case, your outcome (the “new response” in his note) is determined by the both the line and the variation around the line. Carter writes:

the prediction interval for a new response has to take into account not only the (predicted) expectation, but also the (predicted) variation around that expectation. A typical example is attached; I generated simulated data (N=1000) via the indicated formula, and then just regressed y on x. As you’d expect, the confidence bands (red) are quite narrow, but the prediction bands (green) are large – in the true model, they would have a total width of approximately 1, and the estimated model is quite close to that. Your post nicely illustrated that the precision with which we can estimate a mean effect is not equivalent to the variation accounted for by that mean effect; a complementary observation is that the precision with which we can estimate a mean effect is not equivalent to the accuracy with which we can predict a new observation. Nothing deep about that … just the practical points that (1) when people are looking at an interval, they need to be wary of whether it is a confidence interval or a prediction interval; and (2) prediction interval can (and often should be) wide, even if the model is “good” in the sense of being well-estimated.

And here is his figure. “You” are very likely to be between the green lines, but not so likely to be between the red ones.

CarterButtsPredictionInterval

2. Random other variables

I didn’t get into the substantive issues, which are outside my expertise. However, one suggestion I got was interesting: What about happiness? Without endorsing the concept of “life satisfaction” as measured by a single question, I still think this is a nice addition because it underscores the point of wide variation in how this relationship between test scores and income might be experienced.

So here is the same figure, but with the individuals coded according to how they answered the following question in 2008, when they were age 24-28, “All things considered, how satisfied are you with your life as a whole these days? Please give me an answer from 1 to 10, where 1 means extremely dissatisfied and 10 means extremely satisfied.” In the figure, Blue is least satisfied (1-6; 21%), Orange is moderately satisfied (7-8; 46%), and Green is most satisfied (9-10; 32%)

afqt-scatter-satisfied

Even if you squint you probably can’t discern the pattern. Life satisfaction is positively correlated with income at .16, and less so with test scores (.07). Again, significant correlation — not helpful for planning your life.

* I actually used something similar to AFQT: the variable ASVAB, which combines tests of mathematical knowledge, arithmetic reasoning, word knowledge, and paragraph comprehension, and scales them from 0 to 100. For household income, I used a measure of household income relative to the poverty line (adjusted for household size), plus one, and transformed by natural log. I used household income because some good test-takers might marry someone with a high income, or have fewer people in their households — good decisions if your goal is maximizing household income per person.

What’s in a ratio? Teen birth and marriage edition

Even in our post-apocalypse world, births and marriages are still related, somehow.

Some teenage women get married, and some have babies. Are they the same women? First the relationship between the two across states, then a puzzle.

In the years 2008-2012 combined, 2.5 percent of women ages 15-19 per year had a baby, and 1 percent got married. That is, they were reported in the American Community Survey (IPUMS) to have given birth, or gotten married, in the 12 months before they were surveyed. Here’s the relationship between those two rates across states:

teenbirthmarriage1The teen birth rate  ranges from a low of 1.2 percent in New Hampshire to 4.4 percent in New Mexico. The teen marriage rate ranges from .13 percent in Vermont to 2.3 percent in Idaho.

But how much of these weddings are “shotgun weddings” — those where the marriage takes place after the pregnancy begins? And how many of these births are “gungo-ho marriages” — those where the pregnancy follows immediately after the marriage? (OK, I made that term up.) The ACS, which is wonderful for having these questions, is somewhat maddening in not nailing down the timing more precisely. “In the past 12 months” is all you get.

Here is the relationship between two ratios. The x-axis is percentage of teens who got married who also had a birth (birth/marriage). On the y-axis is the percent of teens who had a birth who also got married (marriage/birth).

teenbirthmarriageIf you can figure out how to interpret these numbers, and the difference between them within states, please post your answer in the comments.

 

 

 

Does happy marriage cause happy marriage?

I don’t know how I missed this one, from two Valentine’s Days ago…

For an introductory methods course discussion on: when does something cause something else. Question: Are happier couples happier? Some writers think so:

I can see the study design now: a randomized group of couples were given coupons for date nights, and some time later were compared with a control group without the coupons. Or not. Cosmo summarized:

For their study, researchers from the University of Virginia’s National Marriage Project surveyed 1,600 couples and asked them about everything from relationship satisfaction to sex. They discovered that couples who spend at least one night a week alone together say they’re more committed to their relationship than those who don’t hang out together as much.

(The report, by Brad Wilcox and Jeffery Dew and posted online at the National Marriage Project, is here.)

BREAKING: Researchers discover murder less common among happy couples.
BREAKING: Researchers discover murder less common among happy couples.

Is that it? A simple association between being together and being happy? Almost. First, they say (there are no tables) that they “control for factors such as income, age, education, race, and ethnicity.” Such as? Anyway.

Second, they also claim to have analyzed historical data from the National Survey of Families and Households (1987-1994). They write:

Because we had data from spouses at two time points in the NSFH, we were also able to examine the direction of effects—to determine whether or not couple time reported during the first wave of the survey was associated with marital quality at the second wave. Here, the more couple time individuals reported at the time of the first survey, the more likely they were to be very happy in their marriage at the second survey, five years later. Although the NSFH evidence does not provide us with definitive proof that couple time causes increases in marital quality, the longitudinal character of the data suggests that the relationship may indeed be causal.

So, Wilcox and Dew point #1: If something happened before something else, “the relationship may indeed be causal.” They go on:

It is certainly intuitively true that greater satisfaction with one’s partner should also lead to more time spent in positive, shared activities. Nevertheless, it would be absurd to assume that two partners who intentionally set out to increase positive couple time spent together would typically not benefit from such time with increases in connection and happiness.

So, point #2 is, We already knew the answer before we did the research, because it’s flipping obvious, so who cares about this analysis — it’s almost Valentine’s Day!

There are ways to actually get at “the direction of effects,” like the randomized trial I suggested, or even using longitudinal data and assessing changes in happiness, or controlling for happiness at time 1. Not this.

Anyway, can we think of examples of things that occur before other things without causing them? Here are a few off the top of my head:

  • One sibling dies of a genetic disease now, and then the other one dies from the same disease later: Shocking new evidence that genetics works sideways!
  • Someone has tennis elbow now, and is playing sports later: The surprising way that getting hurt makes you athletic!
  • People who spend more money now have more money later: The more you spend, the more you save!
  • And of course, people who have a lot of sex now are good looking later: Sex up your looks!

I’m open to suggestions for better examples.

Note: I guess in some social science neighborhoods it’s common to analyze the effects of extremely similar things on each other, like pleasure being associated with happiness, or strong left arms being associated with strong right legs. Dew and Wilcox actually published a peer-reviewed article, using this survey, on the association between small acts of kindness in marriage and marital satisfaction. And the result? Couples who are nice to each other are happier.

Change scatter plots

I never read Edward Tufte‘s book The Visual Display of Quantitative Information before. (I have a lot of practice but almost no training in visual presentation of data.)

How do you describe the change in one variable between two points in time? Here’s an example of a “slopegraph” of the kind Tufte likes (many examples here). He takes a list of 15 countries’ government receipts as percentage of GDP for 1970 and 1979, and produces this simple graph:

tufteexample

He likes it because all the ink is data (he’s inexplicably invested in the conservation of ink). And he likes how it’s easy to see the change for each country, as well as the two ranked lists for each time point, and those with unusual changes, such as Britain, the only country with a decline. Those are strengths, and this kind of graph is often great. An alternative is a change scatter plot. Here it is with the same data:

tuftestataIn this you can see the overall upward movement (points over the red line), and specifics such as the three countries that moved as a group from 40-50 percent range to the 50-60 percent range. It also allows a vertical reading, to make comparisons between countries that started the 1970s similarly, such as Switzerland and Greece, Italy and the US, Belgium and Canada — to see how they diverged, with Switzerland, Italy, and Belgium all moving up more during the decade.

I’ve used it in a few cases before, like this graph on changes in marriage rates across 26 countries:

ipums-international-marriage2

I think the scatter plot approach is especially helpful when you want to see how the change differs at different points in a distribution, or when there are lots of data points.

In a figure from this paper on gender segregation among managers we used it to show how the pace of women’s advance into managerial occupations stalled in the 1990s, by overlaying changes from two time periods on the same figure:

wo-scatter

The fact that these lines are essentially parallel is useful and clearly shown. You could make this graph as a slopegraph with three columns, showing two changes, but I don’t think you’d see the pattern as well.

Here’s one I made for something else but haven’t used yet, showing the decline of manufacturing in 50 large metro areas over three decades. In this one they’re all compared with 1980, creating vertical columns of white, gray and black dots over each MA’s 1980 starting point.

ma-manufacturing

Tufte would call all that white space above the diagonal a big waste.

In the Tufte example above there aren’t many cases so you could label them all. In my marriage example you can figure out the countries based on short abbreviations because the names are familiar. And in the managerial occupations or metro areas it’s the shape of the cloud that matters, so it’s OK not to label them.

Here is an example with a lot of cases, each of which is labeled, from an op-ed by Stephanie Coontz in the New York Times, showing the change in the gender composition of occupations from 1980 to 2010. This one adds a categorical scheme that is supposed to make the types of changes more easily discernible. So those in the top gray box are female-dominated, those in the bottom gray box are male-dominated, and those in the middle are integrated. Green lines denote occupations that entered the integrated zone; red lines denote occupations that became more segregated.

30coontz-gr1-popup-v2This has a lot of information, but it doesn’t do much more for me than a table would. And the categorical color scheme hides a number of occupations that changed a lot but remained within the arbitrary categories (gray lines). By converting it to a change scatter plot, you can get a sense of the overall pattern of change, and still isolate those with big changes. In the version here I’ve only tagged the ones that changed 20 percentage points or more, so a lot of information is lost, but the graph is a lot smaller, so you could afford to add some text with additional detail.

tufte-nyt

Here you quickly see that most occupations became more female. And there is a clump of occupations that changed a lot but remained in the middle-range category — medical, education, and human resource managers, and accountants. These were grayed out in the Times version, but they integrated dramatically so you should notice them.

This might not be the best example, but I like this method of showing within-case changes over time.

Marriage promotion: That’s some fine print

In a (paywalled) article in the journal Family Relations, Alan Hawkins, Paul Amato, and Andrea Kinghorn, attempt to show that $600 million in marriage promotion money (taken from the welfare program!) has had beneficial effects at the population level. A couple quick comments on the article (see also previous posts on marriage promotion).

After a literature review that is a model of selective and skewed reading of previous research (worth reading just for that), they use state marriage promotion funding levels* in a year- and state-fixed effects model to predict the percentage of the population that is married, divorced, children living with two parents, one parent, nonmarital births, poverty and near-poverty, each in separate models with no control variables, for the years 2000-2010 using the American Community Survey.

To find beneficial effects — no easy task, apparently — they first arbitrarily divided the years into two periods. Here is the rationale for that:

We hypothesized that any HMI [Healthy Marriage Initiative] effects were weaker (or nonexistent) early in the decade (when funding levels were uniformly low) and stronger in the second half of the decade (when funding levels were at their peak).

This doesn’t make sense to me. If funding levels were low and there was no effect in the early period, and then funding levels rose and effects emerged in the later period, then the model for all years should show that funding had an effect. Correct me if I’m wrong, but I don’t think this passes the smell test.

Then they report their beneficial effects, which are significant if you allow them p<.10 as a cutoff, which is kosher under house rules because they had directional hypotheses.

However, then they admit their effects are only significant because they included Washington, DC. That city had per capita funding levels about 9-times the mean (“about $22” versus “about $2.50”), and had an improving family well-being profile during the period (how much of an outlier DC is on the dependent variables they didn’t discuss, and I don’t have time to show it now, but I reckon it’s pretty extreme, too). To deal with this extreme outlier, they first cut the independent variable in half for DC, bringing it down to about 4.4-times the mean and a third higher then the next most-extreme state, Oklahoma (itself pretty extreme). That change alone cut the number of significant effects down from six to three.

coupdegrace

Then, in the tragic coup de grâce of their own paper, they remove DC from the analysis, and nothing is left. They don’t quite see it that way, however:

But with the District of Columbia excluded from the data (right panel of Table 3), all of the results were reduced to nonsignificance. Once again, most of the regression coefficients in this final analysis were comparable to those in Table 2 (right panel) in direction and magnitude, but they were rendered nonsignificant by a further increase in the size of the standard errors.

Really. What is “comparable in direction and magnitude” mean, exactly? I give you (for free!) the two tables. First, the full model:

tab2

Then, the models with DC rescaled or removed (they’re talking about the comparison between the right-hand panel in both tables):

tab3

Some of the coefficients actually grew in the direction they want with DC gone. But two moved drastically away from the direction of their preferred outcome: the two-parent coefficient is 44% smaller, the poor/near-poor coefficient fell 78%.

Some outlier! As they helpfully explain, “The lack of significance can be explained by the larger standard errors.” In the first adjustment, rescaling DC, all the standard errors at least doubled. And all of the standard errors are at least three-times larger with DC gone. I’m not a medical doctor, but I think it’s fair to say that when removing one case triples your standard errors, your regression model is not feeling well.

One other comment on DC. Any outlier that extreme is a serious problem for regression analysis, obviously. But there is a substantive issue here as well. They feebly attempt to turn the DC results in their favor, by talking about is unique conditions. But what they don’t do is consider the implications of DC’s unique change over this time for their analysis. And that’s what matters in a year- and state-fixed effects model. How did DC change independently of marriage promotion funds? Most importantly, 8% of the population during 2006-2010 was new to town each year. That’s four-times the national average of in-migration in that period. This churning is of course a problem for their analysis, which is trying to measure cumulative effects of program spending in that place — hard to do when so many people moved there after the spending occurred. But it’s also not random churning: the DC population went from 57% Black to 52% Black in just five years. DC is changing, and it’s not because of marriage promotion programs.

Finally, their own attempt at a self-serving conclusion is the most damning:

Despite the limitations, the current study is the most extensive and rigorous investigation to date of the implications of government-supported HMIs for family change at the population level.

Ouch. Oh well. Anyway, please keep giving the programs money, and us money for studying them**:

In sum, the evidence from a variety of studies with different approaches targeting different populations suggests a potential for positive demographic change resulting from funding of [Marriage and Relationship Education] programs, but considerable uncertainty still remains. Given this uncertainty, more research is needed to determine whether these programs are accomplishing their goals and worthy of continued support.

*The link to their data source is broken. They say they got other data by calling around.

**The lead author, Alan Hawkins, has received about $120,000 in funding from various marriage promotion sources.

Marriage makes Wilcox richer, how to lie with asterisks edition

Brad Wilcox wrote a blog post for the Atlantic the other day, in which he described the well-known pattern by which children of married parents on average grow up richer and more highly educated than those raised by single parents. (Follow Wilcox’s lies, errors, and shenanigans under this tag.)

It’s old news, but before I make today’s point, here are a few reasons this kind of thing is wrong and useless.

1. Although the headline says, “Marriage Makes Our Children Richer,” the data Wilcox shows does not approach a causal model. Comparing children who lived with married parents as adolescents to those who did not when they are young adults, he uses controls for mother’s education, race/ethnicity, and household income. Those married parents differ from the single parents in many more ways than that, and did before they got married. Wilcox and actual researchers know this. The Atlantic business editor apparently doesn’t.

2. Even to the extent that marriage helps married people, which it does, on average, this does not imply that mothers who are currently not getting married would get those benefits if they did get married. Because, who are they going to marry? If rich-prince-charming were there most of them would have married him already. So to consider the effects of them marrying you have to take into account that it’s not the right guy or the right relationship at the right time. So, good luck.

3. Finally, so,  you gonna promote marriage? We’ve seen how that works. On the other hand, we know we can mitigate a lot of the harm from difficult childhoods by throwing jobs and money at their food, healthcare, and education needs. If you care about poverty and inequality more than marriage, that’s the way to go.

Anyway, my complaint today is about a particular kind of deception that Wilcox likes do engage in, which Mark Regnerus also did in his infamous paper. The trick is to display unadjusted figures, but describe them as if they include statistical controls. First, how Wilcox did it this time, then an simple example of how wrong it is.

Wilcox shows this figure, among others:

WilcoxAtlanticScreenShotThis is supposedly how marriage makes children richer, because most of the blue bars (“intact family”) are taller than the red bars (“non-intact family”). Set aside what should be the obvious conclusion: having a mother who went to college matters much more than whether your parents were married (which we also already knew). I want to focus on the little symbols *^, which indicate a statistically significant difference with the different controls he used. This is his footnote:

An asterisk (*) indicates a statistically-significant difference (p < 0.05) between respondents who lived with both, married biological parents at Wave I compared with respondents from other family structures, controlling for respondent’s age and race/ethnicity. A hat (^) indicates that there was still a statistically-significant difference when Wave I household income was added as an additional control.

But the numbers shown in the figure are not adjusted for those controls. Presumably, the family structure differences would be smaller with the controls — and they’re already pretty small.

We don’t have his underlying numbers (and I wouldn’t expect to see them in a peer-reviewed journal anytime soon; Regnerus never reported his). So I made a simple example to show how misleading this is. I took the employed 25-55 year-old non-Hispanic White and Black men from the 2011 American Community Survey (excluding the richest 5%) and compared their earnings with and without controls for education, age, hours and weeks worked in the previous year, and marital status. The question is, how much more do White men earn? These are the simple regression results:

wilcox-eg

In the first model, the intercept is the mean earnings for White men, and the Black coefficient is the difference between the White and Black means. This is the unadjusted difference — $13,551 — which is the equivalent of what Wilcox plotted in the graph. But with the controls the difference is reduced to $5,498 — a big difference. The difference is illuminating because it shows how much of the overall gap is accounted for by the distribution of the control variables for Black versus White men.

If Wilcox did this exercise, however, he would produce a graph like this:

wilcox-eg2

See how he did that? He’s selling a $5,498 difference with a $13,551 label. He did the same dishonest thing in his “Knot Yet” report, with Kay Hymowitz and others.

When your audience is ideological foundation bigwigs and credulous (at best) editors, these asterisks and footnotes just make you look smart. These people are apparently impervious to honest reasoning. For the rest of us, at least, it can be a lesson in how to not to do research.

ADDENDUM: How should you do it?

Conrad Hacket below asks what I suggest as a better way to represent the data. Sometimes the unadjusted difference is important even if it is statistically accounted for by some control variable. In the case of race differences in earnings, for example, the fact that there is a $13k+ gap is itself socially important. However, if you are going to make some argument about its importance net of the controls, this is how I would do it, given this very simple linear model, with no interactions or any fancy stuff (note I used non-transformed earnings and censored the top 5% — those at $150k+ — so that the coefficients would be easily interpretable in dollars without being too skewed by the richy-rich).

Using the regression coefficients and the grand means, you sum the products of the means and coefficients for each group, like this:

wilcox-eg3

And then graph the results with a label like this:

wilcox-eg4

Another reasonable strategy instead of using the grand means is to use a common scenario for the calculation, such as a married high school graduate, age 35, who works full-time year-round. Or various other methods of obtaining predicted values.

 

That number you want, it is not precise (women’s labor force edition)

Everyone wants a number. You want to know if the number is different from last year, or 100 years ago. Numbers are great. But the number you’re using is usually a statistic, a number calculated from a sample drawn from a population. You want a good number, you need a good sample. And a big one. And that’s going to cost you.

Who didn’t love the news recently that single British men ages 18-25 change their bedsheets only four times a year? Really? Really. How does anyone know this? Ergoflex, a memory-foam mattress distributor. At least UPI had the decency to report, “No survey details were provided,” although somehow Time found out the sample size was 2,004 (men and women, all ages). Rubbish, I reckon, or bonkers, or whatever. No one can resist a number; methods details don’t make it into the tweet version of the press release.

Here’s a more answerable question: What is the labor force participation rate for married, college graduate women with children, ages 25-54 in the United States? I’d say 76.1% — plus or minus a percentage point — based on the gold standard for labor force data collection, the Current Population Survey, easily analyzable these days for free with the IPUMS online tool.That’s from a sample of 60,000 households with a 90+% response rate, at a cost of umpteen million taxpayer dollars (well spent).

Here’s the trend in that number from 1990 to 2012, with 95% confidence intervals, based on the sample size, as calculated by IPUMS:

cps-error-bars

As more women have gotten college degrees, and the CPS sample has been enlarged, the sample size for this trend has grown and the error bars have shrunk, from a spread of almost 3 points to just less than 2. Still, there are only 8,265 of these women in the sample.

Only! Hold that up to a Gallup or Pew poll and compare confidence intervals when they start dividing and subdividing their samples. (Nothing against them — they give us the information we need to know how much variance there is in the estimates they put out, and then most people [+/- 51%] ignore it.)

There aren’t many one-year changes in this trend that are statistically significant at conventional levels. Of course, with this sample size you could say with confidence the labor force participation rate was higher in the late 1990s than the early 1990s (but check the survey redesign in 1994…), and higher again in the late 2000s than in the early 2000s. But were 2007 and 2002 sample flukes? And if so, what about 2012?

What about if you want a slightly smaller subgroup, say, Black married, college graduate women with children, ages 25-54. That’s a reasonable question. Here’s the trend (note the y-axis scale changed):

cps-error-bars-black

Now the sample size is a couple hundred and the confidence intervals are more than 6 points wide; there isn’t a pair of years in the trend that doesn’t have overlapping confidence intervals. And look at 2007 and 2012 — Black women are blipping in the opposite direction from the larger group in each of those years. Yes, if you put the whole Black trend in the blender with a time trend you have a significant decline of about a fifth of a point per year on average (and a sliver of this change is because of the increasing tendency of college graduates to be in grad school and not working — there are 13 of them in 2012, dragging down the participation rate by 0.6%). But don’t hang a lot on one year.

So, my advice for doing simple description:

  • Eyes on the prize: who cares what the exact number is? Is it a lot or little, going up or going down, higher or lower than some other group? That’s usually what matters.
  • Stick to data with reported methods
  • Know the size of your subsamples, try to get confidence intervals
  • Don’t fixate on (or report) small changes or differences (don’t use that second decimal place if the margin of error is 6%)
  • For trends, pool data from multiple years, or report moving averages
  • Spend tax money on surveys, not war

Video segment on Regnerus and divorce studies

Over the summer Karen Sternheimer and I sat for an interview, and Norton Sociology has released a segment of the video, in which she asks about the Regnerus study on parents’ same-sex relationship history and child outcomes. I don’t have the references for my comments, but I think/hope they’re mostly true.

Click on the picture to go to the Youtube video: