Tag Archives: methods

’16 and Pregnant’ and less so


From Flickr/CC: https://flic.kr/p/6dcJgA

Regular readers know I have objections to the framing of teen pregnancy, as a thing generally and as a problem specifically, separate from the rising age at childbearing generally (see also, or follow the teen births tag).

In this debate, one economic analysis of the effect of the popular MTV show 16 and Pregnant has played an outsized role. Melissa Kearney and Phillip Levine showed that was more decline in teen births in places where the show was popular, and attempted to establish that the relationship was causal — that the show makes people under age 20 want to have babies less. As Kearney put it in a video promoting the study: “the portrayal of teen pregnancy, and teen childbearing, is something they took as a cautionary tale.” (The paper also showed spikes in Twitter and Google activity related to birth control after the show aired.)

This was very big news for the marriage promotion people, because it was taken as evidence that cultural intervention “works” to affect family behavior — which really matters because so far they’ve spent $1 billion+ in welfare money on promoting marriage, with no effect (none), and they want more money.

The 16 and Pregnant paper has been cited to support statements such as:

  • Brad Wilcox: “Campaigns against smoking and teenage and unintended pregnancy have demonstrated that sustained efforts to change behavior can work.”
  • Washington Post: “By working with Hollywood to develop smart story lines on popular shows such as MTV’s ’16 and Pregnant’ and using innovative videos and social media to change norms, the [National Campaign to Prevent Teen and Unplanned Pregnancy] has helped teen pregnancy rates drop by nearly 60 percent since 1991.”
  • Boston Globe: “As evidence of his optimism, [Brad] Wilcox points to teen pregnancy, which has dropped by more than 50 percent since the early 1990s. ‘Most people assumed you couldn’t do much around something related to sex and pregnancy and parenthood,’ he said. ‘Then a consensus emerged across right and left, and that consensus was supported by public policy and social norms. . . . We were able to move the dial.’ A 2014 paper found that the popular MTV reality show ’16 and Pregnant’ alone was responsible for a 5.7 percent decline in teen pregnancy in the 18 months after its debut.”

I think a higher age at first birth is better for women overall, health permitting, but I don’t support that as a policy goal in the U.S. now, although I expect it would be an outcome of things I do support, like better health, education, and job opportunities for people of color and people who are poor.

Anyway, this is all just preamble to a new debate from a reanalysis and critique of the 16 and Pregnant paper. I haven’t worked through it enough to reach my own conclusions, and I’d like to hear from others who have. So I’m just sharing the links in sequence.

The initial paper, posted as a (non-peer reviewed) NBER Working Paper in 2014:

Media Influences on Social Outcomes: The Impact of MTV’s 16 and Pregnant on Teen Childbearing, by Melissa S. Kearney, Phillip B. Levine

This paper explores how specific media images affect adolescent attitudes and outcomes. The specific context examined is the widely viewed MTV franchise, 16 and Pregnant, a series of reality TV shows including the Teen Mom sequels, which follow the lives of pregnant teenagers during the end of their pregnancy and early days of motherhood. We investigate whether the show influenced teens’ interest in contraceptive use or abortion, and whether it ultimately altered teen childbearing outcomes. We use data from Google Trends and Twitter to document changes in searches and tweets resulting from the show, Nielsen ratings data to capture geographic variation in viewership, and Vital Statistics birth data to measure changes in teen birth rates. We find that 16 and Pregnant led to more searches and tweets regarding birth control and abortion, and ultimately led to a 5.7 percent reduction in teen births in the 18 months following its introduction. This accounts for around one-third of the overall decline in teen births in the United States during that period.

A revised version, with the same title but slightly different results, was then published in the top-ranked American Economic Review, which is peer-reviewed:

This paper explores the impact of the introduction of the widely viewed MTV reality show 16 and Pregnant on teen childbearing. Our main analysis relates geographic variation in changes in teen childbearing rates to viewership of the show. We implement an instrumental variables (IV) strategy using local area MTV ratings data from a pre-period to predict local area 16 and Pregnant ratings. The results imply that this show led to a 4.3 percent reduction in teen births. An examination of Google Trends and Twitter data suggest that the show led to increased interest in contraceptive use and abortion.

Then last month David A. Jaeger, Theodore J. Joyce, and Robert Kaestner posted a critique on the Institute for the Study of Labor working paper series, which is not peer-reviewed:

Does Reality TV Induce Real Effects? On the Questionable Association Between 16 and Pregnant and Teenage Childbearing

We reassess recent and widely reported evidence that the MTV program 16 and Pregnant played a major role in reducing teen birth rates in the U.S. since it began broadcasting in 2009 (Kearney and Levine, American Economic Review 2015). We find Kearney and Levine’s identification strategy to be problematic. Through a series of placebo and other tests, we show that the exclusion restriction of their instrumental variables approach is not valid and find that the assumption of common trends in birth rates between low and high MTV-watching areas is not met. We also reassess Kearney and Levine’s evidence from social media and show that it is fragile and highly sensitive to the choice of included periods and to the use of weights. We conclude that Kearney and Levine’s results are uninformative about the effect of 16 and Pregnant on teen birth rates.

And now Kearney and Levine have posted their response on the same site:

Does Reality TV Induce Real Effects? A Response to Jaeger, Joyce, and Kaestner (2016)

This paper presents a response to Jaeger, Joyce, and Kaestner’s (JJK) recent critique (IZA Discussion Paper No. 10317) of our 2015 paper “Media Influences on Social Outcomes: The Impact of MTV’s 16 and Pregnant on Teen Childbearing.” In terms of replication, those authors are able to confirm every result in our paper. In terms of reassessment, the substance of their critique rests on the claim that the parallel trends assumption, necessary to attribute causation to our findings, is not satisfied. We present three main responses: (1) there is no evidence of a parallel trends assumption violation during our sample window of 2005 through 2010; (2) the finding of a false placebo test result during one particular earlier window of time does not invalidate the finding of a discrete break in trend at the time of the show’s introduction; (3) the results of our analysis are robust to virtually all alternative econometric specifications and sample windows that JJK consider. We conclude that this critique does not pose a serious threat to the interpretation of our 2015 findings. We maintain the position that our earlier paper is informative about the causal effect of 16 and Pregnant on teen birth rates.


There are interesting methodological questions here. It’s hard to identify the effects of interventions that are swimming with the tide of change. In fact, the creation of the show, the show’s popularity, the campaign to end teen pregnancy, and the rising age at first birth may all be outcomes of the same general historical trend. So I’m not that invested in the answer to this question, though I am very interested.

There are also questions about the publication process, which I am very invested in. That’s why I work to promote a working paper culture among sociologists (through the SocArXiv project). The original paper was posted on a working paper site without peer review, but NBER is for economists who already are somebody, so that’s a kind of indirect screening. Then it was accepted in a top peer-reviewed journal (somewhat revised), but that was after it had received major attention and accolades, including a New York Times feature before the working paper was even released and a column devoted to it by Nicholas Kristof.

So is this a success story of working paper culture gone right — driving attention to good work faster, and then also drawing the benefits of peer review through the traditional publication process? (And now continuing with open debate on non-gated sites). Or is it a case of political hype driving attention inside and outside of the academy — the kind of thing that scares researchers and makes them want to retreat behind the slower, more process-laden research flow which they hope will protect them from exposure to embarrassment and protect the public from manipulation by the credulous news media. I think the process was okay even if we do conclude the paper wasn’t all it was made out to be. There were other reputational systems at work — faculty status, NBER membership, New York Times editors and sources — that may be as reliable as traditional peer review, which itself produces plenty of errors.

So, it’s an interesting situation — research methods, research implications, and research process.


Filed under Research reports

Cause and effect on myopia

It’s funny for a non-eye specialist to read articles about myopia, which in my line of work rarely means myopia, literally, which is nearsightedness. Takes some getting used to.

Anyway, in my book I use as an example of misleading correlations the link between night lights and myopia in children. Checking it to make sure it is still a good example to keep for the second edition, I was glad to see that it holds up well.

Here’s the story. In a 1999 paper (paywalled | sci-hub), Quinn and colleagues reported a “strong association between myopia and night-time ambient light exposure during sleep in children before they reach two years of age.” That is, kids who slept with night lights were more likely to be nearsighted. This was potentially big news, because we actually don’t fully understand why people become nearsighted, except we know it has to do with reading a lot and spending a lot of time indoors as a kid. They had some idea that light penetrating the eyelids at night might do something, but no real mechanism, just an association over a few hundred kids.

The paper didn’t have some important variables controlled, notably parents’ nearsightedness. Since the condition is also genetic, this was acknowledged as a problem. Still, they wrote:

Although it does not establish a causal link, the statistical strength of the association of night-time light exposure and childhood myopia does suggest that the absence of a daily period of darkness during early childhood is a potential precipitating factor in the development of myopia.

As I stress ad nausem in this post, the “strength” of an association is not an argument for its causal power. And neither is the number of studies in which the association is found. Real spurious findings can produce very strong, easily-reproducible results. And when researchers have a story to fit the rationale can seem strong. Also, the prospect of publishing in a top journal like Nature has to figure in there somewhere. (This problem is endemic in studies of, for example, family structure and child outcomes, among many other subjects.)

In this case there is a very nice explanation, which was reported less than a year later by Zadnik and colleagues (paywalled | sci-hub), who found no association between night lights and myopia – but they did report a very strong relationship between night lights and parents’ myopia. The same pattern was reported in another response to the Quinn paper, in the same issue, by Gwiazda and colleagues. It appears that nearsighted parents like to leave night lights on. Alternately, some other factor causes parental nearsightedness, child nearsightedness, and night light preference, such as education level (e.g., more-educated people read more and use night lights more).

Several other studies have also failed failed to confirm the night-light theory, and now the thing seems to have blown over. It’s not a perfect example, because the bivariate correlation isn’t always found, but I like it as a family-related case. So I think I’ll keep it in.


Filed under Uncategorized

On Asian-American earnings

In a previous post I showed that generalizations about Asian-American incomes often are misleading, as some groups have above-average incomes and some have below-average incomes (also, divorce rates) and that inequality within Asian-American groups was large as well. In this post I briefly expand that to show breakdowns in individual earnings by gender and national-origin group.

The point is basically the same: This category is usually not useful for economic statistics, and should usually be dropped for data on specific groups when possible.

Today’s news

What’s new is a Pew report by Eileen Patten showing trends in race and gender wage gaps. The report isn’t focused on Asian-American earnings, but they stand out in their charts. This led Charles Murray, who is fixated on what he believes is the genetic origin of Asian cognitive superiority, to tweet sarcastically, “Oppose Asian male privilege!” Here is one of Pew’s charts:


The figure, using the Current Population Survey (CPS), shows Asian men earning about 14.5% more per hour than White men, and Asian women earning 11% more than White women. This is not wrong, exactly, but it’s not good information either, as I’ll argue below.

First a note on data

The CPS data is better for some labor force questions (including wages) than the American Community Survey, which is much larger. However, it’s too small a sample to get into detail on Asian subgroups (notice the Pew report doesn’t mention American Indians, an even smaller group). To do that I will need to activate the ACS, which is better for race/ethnic detail.

As a reminder, this is the “race” question on the 2014 American Community Survey, which I use for this post:


There is no “Asian” or “Pacific Islander” box to check. So what do you do if you are thinking, “I’m Asian, what do I check?” The question is premised on that assumption that is not what you’re thinking. Instead, you choose from a list of national origins, which the Census Bureau then combines to make “Asian” (the first 7 boxes) and “Pacific Islander” (the last 3) categories. And you can check as many as you like, which is good because there’s a lot of intermarriage among Asians, and between Asians and other groups (mostly Whites). This is a lot like the Hispanic origin question, which also lists national origins — except that question is prefaced by the unifying phrase, “Is Person 1 of Hispanic, Latino, or Spanish origin?” before listing the options, each beginning with “Yes”, as in “Yes, Cuban.”

Although changes have not been announced, it is likely that future questions will combine the race and Hispanic-origin questions, and also preface the Asian categories with the umbrella term. This may mark the progress of getting Asian immigrants to internalize the American racial classification system, so that descendants from groups that in some cases have centuries-old cultural differentiation start to identify and label themselves as from the same racial group (who would have put Pakistanis and Japanese in the same “race” group 100 years ago?). It’s hard to make this progress, naturally, when so many people from these groups are immigrants — in my sample below, for example, 75% of the full-time, year-round workers are foreign-born.


The problem with the earnings chart Pew posted, and which Charles Murray loved, is that it lumps all the different Asian-origin groups together. That is not crazy but it’s not really good. Of course every group has diversity within it, so any category masks differences, but in my opinion this Asian grouping is worse in that regard than most. If someone argued that all these groups see themselves as united under a common identity that would push me in the direction of dropping this complaint. In any event, the diversity is interesting even if you don’t object to the Pew/Census grouping.

Here are two breakouts. The first is immigration. As I noted, 75% of the full-time, year-round workers (excluding self-employed people, like Pew does) with an Asian/Pacific Islander (Asian for short) racial identification are foreign born. That ranges from less than 4% for Hawaiians, to around 20% for the White+Asian multiple-race people, to more than 90% for Asian Indian men. It turns out that the wage advantage is mostly concentrated among these immigrants. Here is a replication of the Pew chart using the ACS data (a little different because I had to use FTFY workers), using the same colors. On the left is their chart, on the right is the same data limited to US-born workers.


Among the US-born workers the Asian male advantage is reduced from 14.5% to 4.2% (the women’s advantage is not much changed; as in Pew’s chart, Hispanics are a mutually exclusive category.) There are some very high-earning Asian immigrants, especially Indians. Here are the breakdowns, by gender, comparing each of the larger Asian-American groups to Whites:


Seven groups of men and nine groups of women have hourly earnings higher than Whites’, while nine groups of men and seven groups have women have lower earnings. In fact, among Laotians, Hawaiians, and Hmong, even the men earn less than White women. (Note, in my old post, I showed that Asian household incomes are not as high as they look when they are compared instead with those of their local peers, because they are concentrated in expensive metropolitan markets.)

Sometimes when I have a situation like this I just drop the relatively small, complex group, which leads some people to accuse me of trying to skew results. (For example, I might show a chart that has Blacks in the worst position, even though American Indians have it even worse.)

But generalization has consequences, so we should use it judiciously. In most cases “Asian” doesn’t work well. It may make more sense to group people by regions, such as East-, South-, and Southeast Asia, and/or according to immigrant status.


Filed under In the news

Life table says divorce rate is 52.7%

After the eternal bliss, there are two ways out of marriage: divorce or death.

I have posted my code and calculations for divorce rates using the 2010-2012 American Community Survey as an Open Science Framework project. The files there should be enough to get you started if you want to make multiple-decrement life tables for divorce or other things.

Because the American Community survey records year of marriage, and divorce and widowhood, it’s perfectly set up for a multiple-decrement life table approach. A multiple-decrement life table uses the rate of each of two exits for each year of the original state (in this case marriage), to project the probability of either exit happening at or after a given year of marriage. It’s a projection of current rates, not a prediction of what will happen. So, if you write a headline that says, “your chance of divorce if you marry today is 52.7%,” that would be too strong, because it doesn’t take into account that the world might change. Also, people are different.

The divorce rate of 52.7% can accurately be described like this: “If current divorce and widowhood rates remain unchanged, 52.7% of today’s marriages would end in divorce before widowhood.” Here is a figure showing the probability of divorce at or after each year of the model:


So there’s 52.7% up at year 0. Marriages that make it to year 15 have a 30% chance of eventually divorcing, and so on.

Because the ACS doesn’t record anything about the spouses of divorce or widowed people, I don’t know who was married to whom, such as age, education, race-ethnicity, or even the sex of the spouse. So the estimates differ by sex as well as other characteristics. I estimated a bunch of them in the spreadsheet file on the OSF site, but here are the bottom lines, showing, for example, that second or higher-order marriages have a 58.5% projected divorce rate and Blacks have a 64.2% divorce rate, compared with 52.9% for Whites.


(The education ones should be taken with a grain of salt because education levels can change but this assumes they’re static.)

Check the divorce tag for other posts and papers on divorce.

The ASA-style citation to the OSF project would be like this:  Cohen, Philip N. 2016. “Multiple-Decrement Life Table Estimates of Divorce Rates.” Retrieved (osf.io/zber3).


Filed under Me @ work

Why I snarked on a 538 blog post (and I’m sorry)

Gaza. What does inequality have to do with it? (Photo by gloucester2gaza)

Gaza. What does inequality have to do with it? (Photo by gloucester2gaza)

The first thing that bugged me about this blog post by Jay Ulfelder at Five Thirty Eight was not the most important thing. The first thing I reacted to was that Ulfelder opened by asking whether “economic inequality causes political turmoil,” and then chastising, “Just because a belief is widely held, however, does not make it true,” before offering only evidence from economics studies. So I tweeted this obnoxious thing:

It was obnoxious, and I apologize. That response was part of my routine, defensive, complaining about how complex sociological work is neglected in favor of glib economics (e.g., here, here, here). But I do substantively object to the piece. If I had taken the time to figure out what really bugged me about it I could have sent a more constructive Tweet. Oh well, you never get a second chance to make a first snarky response.

What really bugged me is that the piece reduced this question of world-historic importance to a matter of microdata quality and measurement:

In fact, it’s still hard to establish with confidence whether and how economic inequality shapes political turmoil around the world. That’s largely because of the difficulty in measuring inequality…

Despite the slipperiness of “whether and how,”* Ulfelder’s point is definitely that we are “not there yet” on the question of “the belief that inequality causes political crises.” Still, maybe this is a case of trying to sell a narrow empirical piece as something bigger than it is — in which case it’s also a lesson in how people overreact when you do that.

I have to examine my own motives here, because this is one of those times when someone’s empirical claims threaten something that I don’t routinely subject to empirical testing. If there is an actual article of faith in my sociological worldview — and I would not really use the word faith to describe it, it’s more like a foundational understanding — it’s that inequality causes conflict, which causes social change. Ulfelder notes this is attributed partly to Marx, which is one reason why I and so many other sociologists hold it dear, but it’s also because it’s actually true. But that depends on what you mean by true, and here I think I disagree with Ulfelder, who writes:

With such incomplete and blurry information about the crucial quantities, why are so many of us so sure that economic inequality is a principal cause of political turmoil? Careful observation is one answer. Aristotle and Marx drew inferences about the destabilizing effects of inequality from their deep knowledge of the societies around them.

He never explains why this isn’t good enough, instead wandering into a critique of contemporary activist claims, based in part on an argument that “the seminal economic study” on the question is methodologically flawed (I’m sure it is).

This reducing of the question is too reductionist. I would be very interested to know whether within-country economic inequality, measured at the national level, if accurately measured, could help predict which countries would experience political turmoil, if that could be measured with a single indicator. But that’s not answering the question of whether inequality causes political turmoil — it’s one very narrow slice of that giant historical question, for which we have many sources of data and many affirmative answers.

Use a little of Marx’s “deep knowledge of societies” to consider, for example, the anti-colonial revolutions in many countries after 1945. Do you need to test a within-country economic inequality measure to know that such “turmoil” was one consequence of inequality? Of course, the timing and nature of those revolts is an interesting question to be addressed through research, but is such research asking whether inequality causes conflict?

What about slave revolts? What if someone found that harsher slavery regimes were not more likely to explode in revolt than those in which the slaves had enough food and water — would that tell you that inequality does not “cause” conflict? (Inequality causes conflict; that’s why they’re called slave revolts.)

Even, what about the civil rights movement, women’s movement, gay rights movement, or Black Lives Matter?

Does inequality cause conflict? Yes. Of course the relationship is not necessarily linear or simplistically univariate, which is the subject of lots of great sociology (and probably some minor work in other disciplines). But this is the kind of complex issue that data journalism nowadays loves to turn into yes-or-no, show-me-the-scatterplot short blog posts. I’ve done some of that myself, of course — and if I do it with something that’s a vital part of your analytical worldview, feel free to send me a snarky tweet about it.

* Nothing against this expression in general, it’s just slippery in this case because it might or might not be moving the goalposts from the opening question. 


Filed under Politics

Age composition change accounts for about half of the Case and Deaton mortality finding

This paper by Anne Case and Angus Deaton, one of whom just won a Nobel prize in economics, reports that mortality rates are rising for middle-aged non-Hispanic Whites. It’s gotten tons of attention (see e.g., “Why poor whites are dying of despair” in The Week, and this in NY Times).

It’s an odd paper, though, in its focus on just one narrow age group over time. The coverage mostly describes the result as if conditions are changing for a group of people, but the group of people changes every year as new 45-year-olds enter and 54-year-olds leave. That means the population studied is subject to change in its composition. This is especially important because the Baby Boom wave was moving through this group part of that time. The 1999-2013 time frame included Baby Boomers (born 1945-1964) from age 35 to age 68.

My concern is that changes in the age and sex composition of the population studied could account for a non-trivial amount of the trends they report.

For example, they report that the increased mortality is entirely concentrated among those non-Hispanic White men and women who have high school education or less. But this population changed from 1999 to 2013. Using the Current Population Survey — which is not the authority on population trends, but is weighted to reflect Census Bureau estimates of population trends — I see that this group became more male, and older, over the period studied. That’s because the Baby Boomers moved in, causing aging, the population reflects women’s advances in education, relative to men, circa the 1970s. Here are those trends:


It’s odd for a paper on mortality trends not to account for account for sex and age composition changes in the population over time. Even if the effects aren’t huge, I think that’s just good demography hygiene. Now, I don’t know exactly how much difference these changes in population composition would make on mortality rates, because I don’t have the mortality data by education. That would only make a difference if the mortality rates differed a lot by sex and age.

However, setting aside the education issue, we can tell something just looking at the whole non-Hispanic White population, and it’s enough tor raise concerns. In the overall 45-54 non-Hispanic White population, there wasn’t any change in sex composition. But there was a distinct age shift. For this I used the 2000 Census and 2013 American Community Survey. I could get 1999 estimates to match Case and Deaton, but 2000 seems close enough and the Census numbers are easier to get. (That makes my little analysis conservative because I’m lopping off one year of change.)

Look at the change in the age distribution between 2000 and 2013 among non-Hispanic Whites ages 45-54. In this figure I’ve added the birth year range for those included in 2000 and 2013.


That shocking drop at age 54 in 2000 reflects the beginning of the Baby Boom. In 2000 there were a lot more 53-year-olds than there were 54-year-olds, because the Baby Boom started in 1946. (Remember, unlike today’s marketing-term “generations,” the Baby Boom was a real demographic event.) So there was a general aging, but also a big increase in 54-year-olds, between 2000 and 2013, which will naturally increase the mortality rate for that year.

So, to see whether the age shift had a non-trivial impact on the number of deaths in this population, I used one set of mortality rates: 2010 rates for non-Hispanic Whites by single year of age, published here. And I used the age and sex compositions as described above (even though the sex composition barely changed I did it separately by sex and summed them).

The 2010 age-specific mortality rates applied to the 2000 population produce a death rate of 3.939 per 1,000. When applied to the 2013 population they produce a death rate of 4.057 per 1,000. That’s the increase associated with the change in age and sex composition. How big is that difference? The 2013 death rate implies 118,313 deaths in 2013. The 2000 death rate implies 114,869 deaths in 2013. The difference is 3,443 deaths. Remember, this assumes age-specific death rates didn’t change, which is what you want to assess effects of composition change.

So I can say this: if age and sex composition had stayed the same between 2000 and 2013, there would have been 3,443 fewer deaths among non-Hispanic Whites in the ages 45-54.

Here is what Case and Deacon say:

If the white mortality rate for ages 45−54 had held at their 1998 value, 96,000 deaths would have been avoided from 1999–2013, 7,000 in 2013 alone.

So, it looks to me like age composition change accounts for about half of the rise in mortality they report. They really should have adjusted for age.

Here is my spreadsheet table (you can download the file here):


As always, happy to be credited if I’m right, and told if I’m wrong. But if you just have suggestions for more work I could do, that might not work.

Follow up: Andrew Gelman has three excellent posts about this. Here’s the last.


Filed under Research reports

How we really can study divorce using just five questions and a giant sample

It would be great to know more about everything, but if you ask just these five questions of enough people, you can learn an awful lot about marriage and divorce.


First the questions, then some data. These are the question wordings from the 2013 American Community Survey (ACS).

1. What is Person X’s age?

We’ll just take the people who are ages 15 to 59, but that’s optional.

2. What is this person’s marital status?

Surprisingly, we don’t want to know if they’re divorced, just if they’re currently married (I include people are are separated and those who live apart from their spouses for other reasons). This is the denominator in your basic “refined divorce rate,” or divorces per 1000 married people.

3. In the past 12 months, did this person get divorced?

The number of people who got divorced in the last year is the numerator in your refined divorce rate. According to the ACS in 2013 (using population weights to scale the estimates up to the whole population), there were 127,571,069 married people, and 2,268,373 of them got divorced, so the refined divorce rate was 17.8 per 1,000 married people. When I analyze who got divorced, I’m going to mix all the currently-married and just-divorced people together, and then treat the divorces as an event, asking, who just got divorced?

4. In what year did this person last get married?

This is crucial for estimating divorce rates according to marriage duration. When you subtract this from the current year, that’s how long they are (or were) married. When you subtract the marriage duration from age, you get the age at marriage. (For example, a person who is 40 years old in 2013, who last got married in 2003, has a marriage duration of 10 years, and an age at marriage of 30.)

5. How many times has this person been married?

I use this to narrow our analysis down to women in their first marriages, which is a conventional way of simplifying the analysis, but that’s optional.


I restrict the analysis below to women, which is just a sexist convention for simplifying things (since men and women do things at different ages).*

So here are the 375,249 women in the 2013 ACS public use file, ages 16-59, who were in their first marriages, or just divorced from their first marriages, by their age at marriage and marriage duration. Add the two numbers together and you get their current age. The colors let you see the basic distribution (click to enlarge):

2011-2013 agemar figures.xlsx

The most populous cell on the table is 28-year-olds who got married three years ago, at age 25, with 1068 people. The least populous is 19-year-olds who got married at 15 (just 14 of them). The diagonal edge reflects my arbitrary cutoff at age 59.

Divorce results

Now, in each of these cells there are married people, and (in most of them) people who just got divorced. The ratio between those two frequencies is a divorce rate — one specific to the age at marriage and marriage duration. To make the next figure I used three years of ACS data (2011-2013) so the results would be smoother. (And then I smoothed it more by replacing each cell with an average of itself and the adjoining cells.) These are the divorce rates by age at marriage and years married (click to enlarge):

2011-2013 agemar figures.xlsx

The overall pattern here is more green, or lower divorce rates, to the right (longer duration of marriage) and down (older age at marriage). So the big red patch is the first 12 years for marriages begun before the woman was age 25. And after about 25 years of marriage it’s pretty much green, for low divorce rates. The high contrast at the bottom left implies an interesting high risk but steep decline in the first few years after marriage for these late marriages. This matrix adds nuance to the pattern I reported the other day, which featured a little bump up in divorce odds for people who married in their late thirties. From this figure it looks like marriages that start after the woman is about 35 might have less of a honeymoon period than those beginning about age 24-33.

To learn more, I go beyond those five great questions, and use a regression model (same as the other day), with a (collapsed) marriage-age–by–marriage-duration matrix. So these are predicted divorce rates per 1000, holding education, race/ethnicity, and nativity constant (click to enlarge)**:

2011-2013 agemar figures.xlsx

The controls cut down the late-thirties bump and isolate it mostly to the first year. This also shows that the punishing first year is an issue for all ages over 35. The late thirties just showed the bump because that group doesn’t have the big drop in divorce after the first year that the later years do. Interesting!


Here’s where the awesome data let us down. This data is very powerful. It’s the best contemporary big data set we have for analyzing divorce. It has taken us this far, but it can’t explain a pattern like this.

We can control for education, but that’s just the education level at the time of the most recent survey. We can’t know when she got her education relative to the dates of her marriage. Further, from the ACS we can’t tell how many children a person has had, with whom, and when — we only know about children who happen to be living in the household in 2013, so a 50-year-old could be childfree or have raised and released four kids already. And about couples, although we can say things about the other spouse from looking around in the household (such as his age, race, and income), if someone has divorced the spouse is gone and there is no information about that person (even their sex). So we can’t use that information to build a model of divorce predictors.

Here’s an example of what we can only hint at. Remarriages are more likely to end in divorce, for a variety of reasons, which is why we simplify these things by only looking at first marriages. But what about the spouse? Some of these women are married to men who’ve been married before. I can’t how much that contributes to their likelihood of divorce, but it almost certainly does. Think about the bump up in the divorce rate for women who got married in their late thirties. On the way from high divorce rates for women who marry early to low rates for women who marry late, the overall downward slope reflects increasing maturity and independence for women, but it’s running against the pressure of their increasingly complicated relationship situations. That late-thirties bump may have to do with the likelihood that their husbands have been married before. Here’s the circumstantial evidence:

2011-2013 agemar figures.xlsx

See that big jump from early-thirties to late-thirties? All of a sudden 37.5% of women marrying in their late-thirties are marrying men who are remarrying. That’s a substantial risk factor for divorce, and one I can’t account for in my analysis (because we don’t have spouse information for divorced women).

On method

Divorce is complicated and inherently longitudinal. Marriages arise out of specific contexts and thrive or decay in many different ways. Yesterday’s crucial influence may disappear today. So how can we say anything about divorce using a single, cross-sectional survey sample? The unsatisfying answer is that all analysis is partial. But these five questions give us a lot to go on, because knowing when a person got married allows us to develop a multidimensional image of the events, as I’ve demonstrated here.

But, you ask, what can we learn from, say, the divorce propensity of today’s 40-year-olds when we know that just last year a whole bunch of 39-year-olds divorced, skewing today’s sample? This is a real issue. And demography provides an answer that is at once partial and powerful: Simple, we use today’s 39-year-olds, too. In the purest form, this approach gives us the life table, in which one year’s mortality rates — at every age — lead to a projection of life expectancy. Another common application is the total fertility rate (watch the video!), which sums birth rates by age to project total births for a generation. In this case I have not produced a complete divorce life table (which I promised a while ago — it’s coming). But the approach is similar.

These are all synthetic cohort approaches (described nicely in the Week 6 lecture slides from this excellent Steven Ruggles course). In this case, the cohorts are age-at-marriage groups. Look at the table above and follow the row for, say, marriages that started at age 28, to see that synthetic cohort’s divorce experience from marriage until age 59. It’s neither a perfect depiction of the past, nor a foolproof prediction of the future. Rather, it tells us what’s happening now in cohort terms that are readily interpretable.


The ACS is the best thing we have for understanding the basic contours of divorce trends and patterns. Those five questions are invaluable.

* For this I also tossed the people who were reported to have married in the current year, because I wasn’t sure about the timing of their marriages and divorces, but I put them back in for the regressions.

** The codebook for my IPUMS data extraction is here, my Stata code is here. The heat-map model here isn’t in that code file, but this these are the commands (and the margins command took a very long time, so please don’t tell me there’s something wrong with it):

logistic divorce i.agemarc#i.mardurc i.degree i.race i.hispan i.citizen
margins i.agemarc#i.mardurc


Filed under Me @ work