Tag Archives: methods

Framing social class with sample selection

A lot of qualitative sociology makes comparisons across social class categories. Many researchers build class into their research designs by selecting subjects using broad criteria, most often education level, income level, or occupation. Depending on the set of questions at hand, the class selection categories will vary, focusing on, for example, upbringing and socialization, access to resources, or occupational outlook.

In the absence of a substantive review, here are a few arbitrarily selected examplar books from my areas of research:

This post was inspired by the question Caitlyn Collins asked the other day on Twitter:

She followed up by saying, “Social class is nebulous, but precision here matters to make meaningful claims. What do we mean when we say we’re talking to poor, working class, middle class, wealthy folks? I’m looking for specific demographic questions, categories, scales sociologists use as screeners.” The thread generated a lot of good ideas.

Income, education, occupation

Screening people for research can be costly and time consuming, so you want to maximize simplicity as well as clarity. So here’s a way of looking at some common screening variables, and what you might get or lose by relying on them in different combinations. This uses the 2018 American Community Survey, provided by IPUMS.org (Stata data file and code here).

  • I used income, education, and occupation to identify the status of individuals, and generated household class categories by the presence of absence of types of people in each. That means everyone in each household is in the same class category (a choice you might or might not want to make).
  • Income: Total household income divided by an equivalency scale (for cost of living). The scale counts each adult as 1 person, each child under 18 as .70, and then scales that count by ^.70. I divided the resulting distribution into thirds, so households are in the top, middle, or bottom third. Top third is what I called “middle/upper” class, bottom third is “lower class.”
  • Education: I use BA degree to identify households that have (middle/upper) or don’t (lower) a four-year college graduate present. This is 31% of adults.
  • Occupation: I used the 2018 ACS occupation codes, and coded people as middle/upper class if their codes was 10 to 3550, which are management, business, and financial occupations; computer, engineering, and science occupations; education, legal, community service, arts, and media occupations; and healthcare practitioners and technical occupations. It’s pretty close to what we used to call “managerial and professional” occupations. Together, these account for 37% of workers.

So each of these three variables identifies an upper/middle class status of about a third of people.

For lower class status, you can just reverse them. The except is income, which is in three categories. For that, I counted households as lower class if their household income was in the bottom third of the adjusted distribution. In the figures below, that means they’re neither middle/upper class nor lower class if they’re in the middle of the income distribution. This is easily adjusted.

Venn diagrams

You can make Venn diagrams in Stata using the pvenn2 add-on, which I naturally discovered after making these. If  you must know, made these by generating tables in Stata, downloading this free plotter app, entering the values manually, copying the resulting figures into Powerpoint and applying the text there, then printing them to PDF, and extracting the images from PDF using Photoshop. Not recommended workflow.

Here they are. I hope the visuals might help people think about for example, who they might get if they screened on just one of these variables, or how unusual someone is who has a high income or occupation but no BA, and so on. But draw your own conclusions (and feel free to modify the code and follow your own approach). Click to enlarge.

First middle/upper class:

Venn diagram of overlapping class definitions

Then lower class:

Venn diagram of overlapping class definitions.

I said draw your own conclusions, but please don’t draw the conclusion that I think this is the best way to define social class. That’s a whole different question. This is just about simply ways to select people to be research subjects. For other posts on social class, follow this tag, which includes this post about class self identification by income and race/ethnicity.

Data and code: osf.io/w2yvf/

Leave a comment

Filed under Me @ work

Divorce fell in one Florida county (and 31 others), and you will totally believe what happened next

You can really do a lot with the common public misperception that divorce is always going up. Brad Wilcox has been taking advantage of that since at least 2009, when he selectively trumpeted a decline in divorce (a Christmas gift to marriage) as if it was not part of an ongoing trend.

I have reported that the divorce rate in the U.S. (divorces per married woman) fell 21 percent from 2008 to 2017.  And yet yesterday, Faithwire’s Will Maule wrote, “With divorce rates rocketing across the country, it can be easy to lose a bit of hope in the God-ordained bond of marriage.”

Anyway, now there is hope, because, as right-wing podcaster Lee Habeeb wrote in Newsweek, THE INCREDIBLE SUCCESS STORY BEHIND ONE COUNTY’S PLUMMETING DIVORCE RATE SHOULD INSPIRE US ALL. In fact, we may be on the bring of Reversing Social Disintegration, according to Seth Kaplan, writing in National Affairs. That’s because of the Culture of Freedom Initiative of the Philanthropy Roundtable (a right-wing funding aggregator run by people like Art Pope, Betsy Devos, the Bradley Foundation, the Hoover Institution, etc.), which has now been spun off as Cummunio, a marriage ministry that uses marriage programs to support Christian churches. Writes Kaplan:

The program, which has recently become an independent nonprofit organization called Communio, used the latest marketing techniques to “microtarget” outreach, engaged local churches to maximize its reach and influence, and deployed skills training to better prepare individuals and couples for the challenges they might face. COFI highlights how employing systems thinking and leveraging the latest in technology and data sciences can lead to significant progress in addressing our urgent marriage crisis.

The program claims 50,000 people attended four-hour “marriage and faith strengthening programs,” and further made 20 million Internet impressions “targeting those who fit a predictive model for divorce.” So, have they increased marriage and reduced divorce? I don’t know, and neither do they, but they say they do.

Funny aside, the results website today says “Communio at work: Divorce drops 24% in Jacksonville,” but a few days ago the same web page said 28%. That’s probably because Duval County (which is what they’re referring to) just saw a SHOCKING 6% INCREASE IN DIVORCE (my phrase) in 2018 — the 10th largest divorce rate increase in all 40 counties in Florida for which data are available (see below). But anyway, that’s getting ahead of the story.

Gimme the report

The 28% result came from this report by Brad Wilcox and Spencer James, although they don’t link to it. That’s what I’ll focus on here. The report describes the many hours of ministrations, and the 20 million Internet impressions, and then gets to the heart of the matter:

We answer this question by looking at divorce and marriage trends in Duval County and three comparable counties in Florida: Hillsborough, Orange, and Escambia. Our initial data analysis suggests that the COFI effort with Live the Life and a range of religious and civic partners has had an exceptional impact on marital stability in Duval County. Since 2016, the county has witnessed a remarkable decline in divorce: from 2015 to 2017, the divorce rate fell 28 percent. As family scholars, we have rarely seen changes of this size in family trends over such a short period of time. Although it is possible that some other factor besides COFI’s intervention also helped, we think this is unlikely. In our professional opinion, given the available evidence, the efforts undertaken by COFI in Jacksonville appear to have had a marked effect on the divorce rate in Duval County.

A couple things about these very strong causal claims. First, they say nothing about how the “comparable counties” were selected. Florida seems to have 68 counties, 40 of which the Census gave me population counts for. Why not use them all? (You’ll understand why I ask when they get to the N=4 regression.) Second, how about that “exceptional impact,” the “remarkable decline” “rarely seen” in their experience as family scholars? Note there is no evidence in the report of the program doing anything, just the three year trend. And while it is a big decline, it’s one I would call “occasionally seen.” (It helps to know that divorce is generally going down — something the report never mentions.)

To put the decline in perspective, first a quick national look. In 2009 there was a big drop in divorce, accelerating the ongoing decline, presumably related to the recession (analyzed here). It was so big that nine states had crude divorce rate declines of 20% or more in that one year alone. Here is what 2008-2009 looked like:

state divorce changes 08-09.xlsx

So, a drop in divorce on this scale is not that rare in recent times. This is important background Wilcox is (comfortably) counting on his audience not knowing. So what about Florida?

Wilcox and James start with this figure, which shows the number of divorces per 1000 population in Duval County (Jacksonville), and the three other counties:wj1

Again, there is no reason given for selecting these three counties. To test the comparison, which evidently shows a faster decline in Duval, they perform two regression models. (To their credit, James shared their data with me when I requested it — although it’s all publicly available this was helpful to make sure I was doing it the same way they did.) First, I believe they ran a regression with an N of 4, the dependent variable being the 2014-2017 decline in divorce rate, and the independent variable being a dummy for Duval. I share the complete dataset for this model here:

div_chg duval
1. -1.116101 1
2. -0.2544951 0
3. -0.3307687 0
4. -0.5048307 0

I don’t know exactly what they did with the second model, which must somehow how have a larger sample than 4 because it has 8 variables. Maybe 16 county-years? Anyway, doesn’t much matter. Here is their table:


How to evaluate a faster decline among a general trend toward lower divorce rates? If you really wanted to know if the program worked, you would have to study the program, people who were in the program and people who weren’t and so on. (See this writeup of previous marriage promotion disasters, studied correctly, for a good example.) But I’m quite confident that this conclusion is ridiculous and irresponsible: “In our professional opinion, given the available evidence, the efforts undertaken by COFI in Jacksonville appear to have had a marked effect on the divorce rate in Duval County.” No one should take such a claim seriously except as a reflection on the judgment or motivations of its author.

Because the “comparison counties” was bugging me, I got the divorce counts from Florida’s Vital Statistics office (available here), and combined them with Census data on county populations (table S0101 on census.data.gov). Since 2018 has now come out, I’m showing the change in each county’s crude divorce rate from 2015, before Communio, through 2018.

florida divorce counties.xlsx

You can see that Duval has had a bigger drop in divorce than most Florida counties — 32 of which saw divorce rates fall in this period. Of the counties that had bigger declines, Monroe and Santa Rosa are quite small, but Lake County is mid-sized (population 350,000), and bigger than Escambia, which is one of the comparison counties. How different their report could have been with different comparison cases! This is why it’s a good idea to publicly specify your research design before you collect your data, so people don’t suspect you of data shenanigans like goosing your comparison cases.

What about that 2018 rebound? Wilcox and James stopped in 2017. With the 2018 data we can look further. Eighteen counties had increased divorce rates in 2018, and Duval’s was large at 6%. Two of the comparison cases (Hillsborough and Escambria) had decreases in divorce, as did the state’s largest county, Miami-Dade (down 5%).

To summarize, Duval County had a larger than average decline in divorce rates in 2014-2017, compared with the rest of Florida, but then had a larger-than-average increase in 2018. That’s it.


Obviously, Communio wants to see more marriage, too, but here not even Wilcox can turn the marriage frown upside down.


Why no boom in marriage, with all those Internet hits and church sessions? They reason:

This may be because the COFI effort did not do much to directly promote marriage per se (it focused on strengthening existing marriages and relationships), or it may be because the effort ended up encouraging Jacksonville residents considering marriage to proceed more carefully. One other possibility may also help explain the distinctive pattern for Duval County. Hurricane Irma struck Jacksonville in September of 2017; this weather event may have encouraged couples to postpone or relocate their weddings.

OK, got it — so they totally could have increased marriage if they had wanted to. Except for the hurricane. I can’t believe I did this, but I did wonder about the hurricane hypothesis. Here are the number of marriages per month in Duval County, from 13 months before Hurrican Irma (September 2017), to 13 months after, with Septembers highlighted.

jacksonville marriges.xlsx

There were fewer marriages in September 2017 than 2016, 51 fewer, but September is a slow month anyway. And they almost made up for it with a jump in December, which could be hurricane-related postponements. But then the following September was no better, so this hypothesis doesn’t look good. (Sheesh, how much did they get paid to do this report? I’m not holding back any of the analysis here.)

Aside: Kristen & Jessica had a beautiful wedding in Jacksonville just a few days after Hurricane Irma. Jessica recalled, “Hurricane Irma hit the week before our wedding, which damaged our venue pretty badly. As it was outdoors on the water, there were trees down all over the place and flooding… We were very lucky that everything was cleaned up so fast. The weather the day of the wedding turned out to be perfect!” I just had to share this picture, for the Communio scrapbook:


Photo by Jazi Davis in JaxMagBride.

So, to recap: Christian philanthropists and intrepid social scientists have pretty much reversed social disintegration and the media is just desperate to keep you from finding out about it.

Also, Brad Wilcox lies, cheats, and steals. And the people who believe in him, and hire him to carry their social science water, don’t care.

Leave a comment

Filed under Research reports

Do rich people like bad data tweets about poor people? (Bins, slopes, and graphs edition)

Almost 2,000 people retweeted this from Brad Wilcox the other day.


Brad shared the graph from Charles Lehman (who noticed later that he had mislabeled the x-axis, but that’s not the point). First, as far as I can tell the values are wrong. I don’t know how they did it, but when I look at the 2016-2018 General Social Survey, I get 4.3 average hours of TV for people in the poorest families, and 1.9 hours for the richest. They report higher highs (looks like 5.3) and lower lows (looks like 1.5). More seriously, I have to object to drawing what purports to be a regression line as if those are evenly-spaced income categories, which makes it look much more linear than it is.

I fixed those errors — the correct values, and the correct spacing on the x-axis — then added some confidence intervals, and what I get is probably not worth thousands of self-congratulatory woots, although of course rich people do watch less TV. Here is my figure, with their line (drawn in by hand) for comparison:


Charles and Brad’s post got a lot of love from conservatives, I believe, because it confirmed their assumptions about self-destructive behavior among poor people. That is, here is more evidence that poor people have bad habits and it’s just dragging them down. But there are reasons this particular graph worked so well. First, the steep slope, which partly results from getting the data wrong. And second, the tight fit of the regression line. That’s why Brad said, “Whoa.” So, good tweet — bad science. (Surprise.) Here are some critiques.

First, this is the wrong survey to use. Since 1975, GSS has been asking people, “On the average day, about how many hours do you personally watch television?” It’s great to have a continuous series on this, but it’s not a good way to measure time use because people are bad at estimating these things. Also, GSS is not a great survey for measuring income. And it’s a pretty small sample. So if those are the two variables you’re interested in, you should use the American Time Use Survey (available from IPUMS), in which respondents are drawn from the much larger Current Population Survey samples, and asked to fill out a time diary. On the other hand, GSS would be good for analyzing, for example, whether people who believe the Bible is the “the actual word of God and is to be taken literally, word for word” watch TV more than those who believe it is “an ancient book of fables, legends, history, and moral precepts recorded by men” (Yes, they do, about an hour more.) Or looking at all the other social variables GSS is good for.

On the substantive issue, Gray Kimbrough pointed out that the connection between family income and TV time may be spurious, and is certainly confounded with hours spent at work. When I made a simple regression model of TV time with family income, hours worked, age, sex, race/ethnicity, education, and marital status (which again, should be done better with ATUS), I did find that both hours worked and family income had big effects. Here they are from that model, as predicted values using average marginal effects.

tv work faminc

The banal observation that people who spend more time working spend less time watching TV probably wouldn’t carry the punch. Anyway, neither resolves the question of cause and effect.

Fits and slopes

On the issue of the presentation of slopes, there’s a good lesson here. Data presentation involves trading detail for clarity. And statistics have both have a descriptive and analytical purpose. Sometimes we use statistics to present information in simplified form, which allows better comprehension. We also use statistics to discover relationships we couldn’t otherwise — such as multivariate relationships that you can’t discern visually. The analyst and communicator has to choose wisely what to present. A good propagandist knows what to manipulate for political effect (a bad one just tweets out crap until they get lucky).

Here’s a much less click-worthy presentation of the relationship between family income and TV time. Here I truncate the y-axis at 12 hours (cutting off 1% of the sample), translate the binned income categories into dollar values at the middle of each category, and then jitter the scatterplot so you can see how many points are piled up in each spot. The fitted line is Stata’s median spline, with 9 bands specified (so it’s the median hours at the median income in 9 locations on the x-axis). I guess this means that, at the median, rich people in America watch about an hour of TV per day less than poor people, and the action is mostly under $50,000 per year. Woot.

gss tv income

Finally, a word about binning and the presentation of data (something I’ve written about before, here and here). We make continuous data into categories all the time, starting from measurement. We usually measure age in years, for example, although we could measure it in seconds or decades. Then we use statistics to simplify information further, for example by reporting averages. In the visual presentation of data, there is a particular problem with using averages or data bins to show relationships — you can show slopes that way nicely, but you run the risk of making relationships look more closely correlated than they are. This happens in the public presentation of data when analysts are showing something of their work product — such as a scatterplot with a fitted line — to demonstrate the veracity of their findings. When they bin the data first, this can be very misleading.

Here’s an example. I took about 1000 men from the GSS, and compared their age and income. Between the ages of 25 and 59, older men have higher average incomes, but the fit is curved with a peak around 45. Here is the relationship, again using jittering to show all the individuals, with a linear regression line. The correlation is .23

c1That might be nice to look at but it’s hard to see the underlying relationship. It’s hard to even see how the fitted line relates to the data. So you might reduce it by showing the average income at each age. By pulling the points together vertically into average bins, this shows the relationship much more clearly. However, it also makes the relationship look much stronger. The correlation in this figure is .65. Now the reader might think, “Whoa.”

c2Note this didn’t change the slope much (it still runs from about $30k to $60k), it just put all the dots closer to the line. Finally, here it is pulling the averages together in horizontal bins, grouping the ages in fives (25-29, 30-34 … 55-59). The correlation shown here is .97.


If you’re like me, this is when you figured out that reducing this to two dots would produce a correlation of 1.0 (as long as the dots aren’t exactly level).

To make good data presentation tradeoffs requires experimentation and careful exposition. And, of course, transparency. My code for this post is available on the Open Science Framework here (you gotta get the GSS data first).


Filed under In the news

Decadally-biased marriage recall in the American Community Survey

Do people forget when they got married?

In demography, there is a well-known phenomenon known as age-heaping, in which people round off their ages, or misremember them, and report them as numbers ending in 0 or 5. We have a measure, known as Whipple’s index, that estimates the extent to which this is occurring in a given dataset. To calculate this you take the number of people between ages 23 and 62 (inclusive), and compare it to five-times the number of those whose ages end in 0 or 5 (25, 30 … 60), so there are five-times as many total years as 0 and 5 years.

If the ratio of 0/5s to the total is less than 105, that’s “highly accurate” by the United Nations standard, a ratio 105 to 110 is “fairly accurate,” and in the range 110 to 125 age data should be considered “approximate.”

I previously showed that the American Community Survey’s (ACS) public use file has a Whipple index of 104, which is not so good for a major government survey in a rich country. The heaping in ACS apparently came from people who didn’t respond to email or mail questionnaires and had to be interviewed by Census Bureau staff by phone or in person. I’m not sure what you can do about that.

What about marriage?

The ACS has a great data on marriage and marital events, which I have used to analyze divorce trends, among other things. Key to the analysis of divorce patterns is the question, “When was this person last married?” (YRMARR) Recorded as a year date, this allows the analyst to take into account the duration of marriage preceding divorce or widowhood, the birth of children, and so on. It’s very important and useful information.

Unfortunately, it may also have an accuracy problem.

I used the ACS public use files made available by IPUMS.org, combining all years 2008-2017, the years they have included the variable YRMARR. The figure shows the number of people reported to have last married in each year from 1936 to 2015. The decadal years are highlighted in black. (The dropoff at the end is because I included surveys earlier than those years.)

year married in 2016.xlsx

Yikes! That looks like some decadal marriage year heaping. Note I didn’t highlight the years ending in 5, because those didn’t seem to be heaped upon.

To describe this phenomenon, I hereby invent the Decadally-Biased Marriage Recall index, or DBMR. This is 10-times the number of people married in years ending in 0, divided by the number of people married in all years (starting with a 6-year and ending with a 5-year). The ratio is multiplied by 100 to make it comparable to the Whipple index.

The DBMR for this figure (years 1936-2015) is 110.8. So there are 1.108-times as many people in those decadal years as you would expect from a continuous year function.

Maybe people really do get married more in decadal years. I was surprised to see a large heap at 2000, which is very recent so you might think there was good recall for those weddings. Maybe people got married that year because of the millennium hoopla. When you end the series at 1995, however, the DBMR is still 110.6. So maybe some people who would have gotten married at the end of 1999 waited till New Years day or something, or rushed to marry on New Year’s Eve 2000, but that’s not the issue.

Maybe this has to do with who is answering the survey. Do you know what year your parents got married? If you answered the survey for your household, and someone else lives with you, you might round off. This is worth pursuing. I restricted the sample to just those who were householders (the person in whose name the home is owned or rented), and still got a DBMR of 110.7. But that might not be the best test.

Another possibility is that people who started living together before they were married — which is most Americans these days — don’t answer YRMARR with their legal marriage date, but some rounded-off cohabitation date. I don’t know how to test that.

Anyway, something to think about.

Leave a comment

Filed under Research reports

Theology majors marry each other a lot, but business majors don’t (and other tales of BAs and marriage)

The American Community Survey collects data on the college majors of people who’ve graduated college. This excellent data has lots of untapped potential for family research, because it tells us something about people’s character and experience that we don’t have from any other variables in this massive annual dataset. (It even asks about a second major, but I’m not getting into that.)

To illustrate this, I did two data exercises that combine college major with marital events, in this case marriage. Looking at people who just married in the previous year, and college major, I ask: Which majors are most and least likely to marry each other, and which majors are most likely to marry people who aren’t college graduates?

I combined eight years of the ACS (2009-2016), which gave me a sample of 27,806 college graduates who got married in the year before they were surveyed (to someone of the other sex). Then I cross-tabbed the major of wife and major of husband, and produced a table of frequencies. To see how majors marry each other, I calculated a ratio of observed to expected frequencies in each cell on the table.

Example: With weights (rounding here), there were a total of 2,737,000 BA-BA marriages. I got 168,00 business majors marrying each other, out of 614,000 male and 462,000 female business majors marrying altogether. So I figured the expected number of business-business pairs was the proportion of all marrying men that were business majors (.22) times the number of women that were business majors (461,904), for an expected number of 103,677 pairs. Because there were 168,163 business-business pairs, the ratio is 1.6.  (When I got the same answer flipping the genders, I figured it was probably right, but if you’ve got a different or better way of doing it, I wouldn’t be surprised!)

It turns out business majors, which are the most numerous of all majors (sigh), have the lowest tendency to marry each other of any major pair. The most homophilous major is theology, where the ratio is a whopping 31. (You have to watch out for the very small cells though; I didn’t calculate confidence intervals.) You can compare them with the rest of the pairs along the diagonal in this heat map (generated with conditional formatting in Excel):

spouse major matching

Of course, not all people with college degrees marry others with college degrees. In the old days it was more common for a man with higher education to marry a woman without than the reverse. Now that more women have BAs, I find in this sample that 35% of the women with BAs married men without BAs, compared to just 22% of BA-wielding men who married “down.” But the rates of down-marriage vary a lot depending on what kind of BA people have. So I made the next figure, which shows the proportion of male and female BAs, by major, marrying people without BAs (with markers scaled to the size of each major). At the extreme, almost 60% of the female criminal justice majors who married ended up with a man without a BA (quite a bit higher than the proportion of male crim majors who did the same). On the other hand, engineering had the lowest overall rate of down-marriage. Is that a good thing about engineering? Something people should look at!

spouse matching which BAs marry down

We could do a lot with this, right? If you’re interested in this data, and the code I used, I put up data and Stata code zips for each of these analyses (including the spreadsheet): BA matching, BA’s down-marrying. Free to use!


Filed under Research reports

No, early marriage is not more common for college graduates

Update: IFS has taken down the report I critiqued here, and put up a revised report. They have added an editor’s note, which doesn’t mention me or link to this post:

Editor’s Note: This post is an update of a post published on March 14, 2018. The original post looked at marriage trends by education among all adults under age 25. It gave the misimpression that college graduates were more likely to be married young nowadays, compared to non-college graduates.

At the Institute for Family Studies, Director of Research Wendy Wang has a post up with the provocative title, “Early Marriage is Now More Common For College Graduates” (linking to the Internet Archive version).

She opens with this:

Getting married at a young age used to be more common among adults who didn’t go to college. But the pattern has reversed in the past decade or so. In 2016, 9.4% of college graduates ages 18 to 24 have ever been married, which is higher than the share among their peers without a college degree (7.9%), according to my analysis of the most recent Census data.

And then the dramatic conclusion:

“What this finding shows is that even at a young age, college-educated adults today are more likely than their peers without a college degree to be married. And this is new.”

That would be new, and surprising, if it were true, but it’s not.

Here’s the figure that supports the conclusion:


It shows that 9.4% of college graduates in the age range 18-24 have been married, compared with 7.9% of those who did not graduate from college. (The drop has been faster for non-graduates, but I’m setting aside the time trend for now.) Honestly, I guess you could say, based on this, that young college graduates are more likely than non-graduates to “be married,” but not really.

The problem is there are very very few college graduates in the ages 18-19. The American Community Survey, which they used here, reports only about 12,000 in the whole country, compared with 8.7 million people without college degrees ages 18-19 (this is based on the public use files that IPUMS.org uses; which is what I use in the analysis below). Wow! There are lots and lots of non-college graduates below age 20 (including almost everyone who will one day be a college graduate!), and very few of them are married. So it looks like the marriage rate is low for the group 18-24 overall. Here is the breakdown by age and marital status for the two groups: less than BA education, and BA or higher education — on the same population scale, to help illustrate the point:


If you pool all the years together, you get a higher marriage rate for the college graduates, mostly because there are so few college graduates in the younger ages when hardly anyone is married.

To show the whole thing in terms of marriage rates, here is the marital status for the two groups at every age from 15 (when ACS starts asking about marital status) to 54.


Ignoring 19-21, where there are a tiny number of college graduates, you see a much more sensible pattern: college graduates delay marriage longer, but then have higher rates at older ages (starting at age 28), for all the reasons we know marriage is ultimately more common among college graduates. In fact, if you used ages 15-24 (why not?), you get an even bigger difference — with 9.4% of college graduates married and just 5.7% of non-college graduates. Why not? In fact, what about ages 0-24? It would make almost as much sense.

Another way to do this is just to look at 24-year-olds. Since we’re talking about the ever-married status, and mortality is low at these ages, this is a case where the history is implied in the cross-sectional data. At age 24, as the figure shows, 19.9% of non-college graduates have been married, compared with 12.9% of college graduates. Early marriage is not more common for college graduates.

In general, I don’t recommend comparing college graduates and non-graduates, at least in cross-sectional data, below age 25. Lots of people finishing college below age 25 (and increasingly after that age as well). There is also an important issue of endogeneity here, which always makes education and age analysis tricky. Some people (mostly women) don’t finish college because they get married and have children).

Anyway, it looks to me like someone working for a pro-marriage organization saw what seemed like a story implying marriage is good (that’s why college graduates do it, after all), and one that also fits with the do-what-I-say-not-what-I-do criticism of liberals, who are supposedly not promoting marriage among poor people while they themselves love to get married (a critique made by Charles Murray, Brad Wilcox, and others). And, before thinking it through, they published it.

Mistakes happen. Fortunately, I dislike the Institute for Family Studies (see the whole series under this tag), and so I read it and pointed out this problem within a couple hours (first on Twitter, less than two hours after Wang tweeted it). It’s a social media post-publication peer review success story! If they correct it.


Filed under Research reports

For social relationships outside marriage

Stephanie Coontz has a great piece in tomorrow’s New York Times titled, “For a Better Marriage, Act Like a Single Person.” From her intro:

Especially around Valentine’s Day, it’s easy to find advice about sustaining a successful marriage, with suggestions for “date nights” and romantic dinners for two. But as we spend more and more of our lives outside marriage, it’s equally important to cultivate the skills of successful singlehood. And doing that doesn’t benefit just people who never marry. It can also make for more satisfying marriages.

From there she develops the case with, as usual, a lot of the right research. Well worth a read.

Stephanie used two empirical bits from my work:

No matter how much Americans may value marriage, we now spend more time living single than ever before. In 1960, Americans were married for an average of 29 of the 37 years between the ages of 18 and 55. That’s almost 80 percent of what was then regarded as the prime of life. By 2015, the average had dropped to only 18 years.

In many ways, that’s good news for marriages and married people. Contrary to some claims, marrying at an older age generally lowers the risk of divorce. It also gives people time to acquire educational and financial assets, as well as develop a broad range of skills — from cooking to household repairs to financial management — that will stand them in good stead for the rest of their lives, including when a partner is unavailable.

The first figure, the average years spent in marriage between the ages of 18 and 55 is very easy to calculate. You just sum the proportion of people married at each age. Here’s what it looks like, comparing 1960 (from the decennial Census) and 2015 (from the American Community Survey), both from IPUMS.org (click to enlarge):


I think it’s a nice, simple way to show the declining footprint of marriage in American life. (I first did this, and described in the rationale, in 2010.)

The bit about older age at marriage being associated with lower odds of divorce is from this post. Here’s the result, showing odds of divorce in one year by age at marriage, with controls for duration, education, race/ethnicity, and nativity, for women in their first marriages (click to enlarge):
Divorce by age at marriage

There’s more discussion in the post, as well as in this followup post, which has this cool figure, where red is the highest odds of divorce and green is the lowest, and the axes are years married and age at marriage (click to enlarge):

Divorce By Age And Duration

My new book is out! Enduring Bonds: Inequality, Marriage, Parenting, and Everything Else That Makes Families Great and Terrible. Available all the usual places, plus here at the University of California Press, where Chapter 1 is available as a sample, and where instructors can request a review copy.

1 Comment

Filed under Research reports

Data analysis: Are older newlyweds saving marriage?

COS open data badgeCOS Open Materials badge

Is the “institution” still in decline if the incidence of marriage rebounds, but only at older ages?

In my new book I’ve revisited old posts and produced this figure, which shows the refined marriage rate* from 1940 to 2015, with a discussion of possible futures:


The crash scenario – showing marriage ending around 2050, is there to show where the 1950-2014 trajectory is headed (it’s also a warning against using linear extrapolation to predict the future). The rebound scenario is intended to show how unrealistic the “revive marriage culture” people are. The taper scenario emerges as the most likely alternative; in fact, it’s grown more likely since I first made the figure a few years ago, as you can see by the 2010-2014 jag.

So let’s consider the tapering scenario more substantively — what would it look like? One way to get a declining marriage rate is if marriage is increasingly delayed, even if it doesn’t become less common; people still marry, but later. (If everyone got married at age 99, we would have universal marriage and a very low refined marriage rate.) I give some evidence for this scenario here.

These trends are presented with minimal discussion; I’m not looking at race/ethnicity or social class, childbearing or the recession; I’m not discussing divorce and remarriage and cohabitation, and I’m not testing hypotheses. (This is a list of research suggestions!) To make the subject more enticing as a research topic (and for accountability), I’ve shared the Census data, Stata code, and spreadsheet file used to make this post in this OSF project. You can use anything there you want. You can also easily fork the project — that is, make a duplicate of its contents, which you then own, and take off on your own trajectory, by adding to or modifying them.


For some context, here is the trend in percentage of men and women ever married, by age, from 1960. (“Ever married” means currently married, separated, divorced, or widowed.) This clearly shows both life-course delay and lifetime decline, but delay is much more prominent, at least so far. Even now, almost 90% of people have been married by age 60 or so, while the marriage rates for people under 35 have plummeted.


People become ever-married when they get first-married. We measure ever-married prevalence from a survey question on current marital status, but first-marriage incidence requires a question like the American Community Survey asks, “In the past 12 months, did this person get married?” Because they also ask how many times each person has been married, you can calculate a first marriage rate with this ratio:

(once married & married in the past 12 months) / (never married + (once married & married in the past 12 months))

Until recently it hasn’t been easy to measure first-marriage across all ages; now that we have the ACS marital events data (since 2008) we can. This allows us to look at the timing of first marriage, which means we can use current age-specific first-marriage rates to project lifetime ever-married rates under current conditions.

Here are the first-marriage rates for men and women, by age. Each set of bars shows the trend from 2008 to 2016. The left side shows men, by age; the right side shows women, by age; the totals for men and women are in the middle. This shows that first-marriage rates have fallen for men and women under age 35, but increased for those over age 35. The total first-marriage rate has rebounded from the 2013 crater, but is still lower than 2008.


This is a short-range trend, 9 years. It could be recession-specific, with people delaying marriage because of hardships, or relationships falling apart under economic stress, and then hurrying to marry a few years later. But it also fits the long-term trend of delay over decline.

The overall rates for men and women show that the 2014-2016 rebound has not brought first-marriage rates back to their 2008 level. However, what about lifetime odds of marriage? The next figure uses women’s age-specific first-marriage rates to project lifetime odds of marriage for three years: 2008, the 2013 crater, and 2016. This shows, for example, that at 2008 rates 59% of women would have married by age 30, compared with 53% in both 2013 and 2016.


The 2013 and 2016 lines diverge after age 30, and by age 65 the projected lifetime ever-married rates have fully recovered. This implies that marriage has been delayed, but not forgone (or denied).

Till now I’ve shown age and sex-specific rates, but haven’t addressed other things that might changed in the never-married population. Finally, I estimated logistic regressions predicting first-marriage among never married men and women. The models include race, Hispanic origin, nativity, education, and age. In addition to the year and age patterns above, the models show that all races have lower rates than Whites, Hispanics have lower rates than non-Hispanics, foreign-born people have higher rates (which explains the Hispanic result), and people with more education first-marry more (code and results in the OSF project).

To see whether changes in these other variables change the story, I used the regressions to estimate first-marriage rates at the overall mean of all variables. These show a significant rebound from the bottom, but not returning to 2008 levels, quite similar to the unadjusted trends above:


This is all consistent with the taper scenario described at the top. Marriage delayed, which reduces the annual marriage rate, but with later marriage picking up much of the slack, so that the decline in lifetime marriage prevalence is modest.

* The refined marriage rate is the number of marriages as a fraction of unmarried people. This is more informative than the crude marriage rate (which the National Center for Health Statistics tracks), which is marriages as a fraction of the total population. In this post I use what I guess you would call an age-specific refined first-marriage rate, defined above.

1 Comment

Filed under Research reports

Science finds tiny things nowadays (Malia edition)

We have to get used to living in a world where science — even social science — can detect really small things. Understanding how important really small things are, and how to interpret them, is harder nowadays than just finding them.

Remember when Hanna Rosin wrote this?

One of the great crime stories of the last twenty years is the dramatic decline of sexual assault. Rates are so low in parts of the country — for white women especially — that criminologists can’t plot the numbers on a chart.

Besides being wrong about rape (it has declined a lot, but it’s still high compared with most countries), this was a funny statement about science (I’ve heard we can even plot negative numbers now!). But the point is we have problems understanding, and communicating about, small things.

So, back to names.

In 2009, the peak year for the name Malia in the U.S., 1,681 girls were given that name, according to the Social Security Administration, or .041% of the 4.14 million children born that year (there are no male Malias in the SSA’s public database, meaning they have never recorded more than 4 in one year). That year, 7.5% of women ages 18-44 had a baby. If my arithmetic is right, say you know 100 women ages 18-44, and each of them knows 100 others (and there is no overlap in your network). That would mean there is a 30% chance one of your 10,000 friends of a friend had a baby girl and named her Malia in 2009. But probably there is a lot of overlap; if your friend-of-friend network is only 1,000 women 18-44 then that chance would fall to 3%.

Here is the trend in girls named Malia, relative to the total number of girls born, from 1960 to 2016:


To make it easier to see the Malias, here is the same chart with the y-axis on a log scale.


This shows that Malia has been on a long upward trend, from less than 50 per year in the 1960s to more than 1,000 per year now. And it also shows a pronounced spike in 2009, the year Malia peaked .041%. In that year, the number of people naming daughters Malia jumped 75% before declining over the next three years to resume it’s previous trend. Here is the detail on the figure, just showing the Malia in 2005-2016:


What happened there? We can’t know for sure. Even if you asked everyone why they named their kid what they did, I don’t know what answers you would get. But from what we know about naming patterns, and their responsiveness to names in the news (positive or negative), it’s very likely that the bump in 2009 resulted from the high profile of Barack Obama and his daughter Malia, who was 11 when Obama was elected.

What does a causal statement like that that really mean? In 2009, it looks to me like about 828 more people named their daughters Malia than would have otherwise, taking into account the upward trend before 2008. Here’s the actual trend, with a simulated trend showing no Obama effect:


Of course, Obama’s election changed the world forever, which may explain why the upward trend for Malia accelerated again after 2013. But in this simple simulation, which brings the “no Obama” trend back into line with the actual trend in 2014, there were 1,275 more Malias born than there would have been without the Obama election. This implies that over the years 2008-2013, the Obama election increased the probability of someone naming their daughter Malia by .00011, or .011%.

That is a very small effect. I think it’s real, and very interesting. But what does it mean for anything else in the world? This is not a question of statistical significance, although those tools can help. (These names aren’t a probability sample, it’s a list of all names given.) So this is a question for interpreting research findings now that we have these incredibly powerful tools, and very big data to analyze with them. The number alone doesn’t tell the story.


Filed under Me @ work