Data analysis shows Journal Impact Factors in sociology are pretty worthless

The impact of Impact Factors

Some of this first section is lifted from my blockbuster report, Scholarly Communication in Sociology, where you can also find the references.

When a piece of scholarship is first published it’s not possible to gauge its importance immediately unless you are already familiar with its specific research field. One of the functions of journals is to alert potential readers to good new research, and the placement of articles in prestigious journals is a key indicator.

Since at least 1927, librarians have been using the number of citations to the articles in a journal as a way to decide whether to subscribe to that journal. More recently, bibliographers introduced a standard method for comparing journals, known as the journal impact factor (JIF). This requires data for three years, and is calculated as the number of citations in the third year to articles published over the two prior years, divided by the total number of articles published in those two years.

For example, in American Sociological Review there were 86 articles published in the years 2017-18, and those articles were cited 548 times in 2019 by journals indexed in Web of Science, so the JIF of ASR is 548/86 = 6.37. This allows for a comparison of impact across journals. Thus, the comparable calculation for Social Science Research is 531/271 = 1.96, and it’s clear that ASR is a more widely-cited journal. However, comparisons of journals in different fields using JIFs is less helpful. For example, the JIF for the top medical journal, New England Journal of Medicine, is currently 75, because there are many more medical journals publishing and citing more articles at higher rates, and more quickly than do sociology journals. (Or maybe NEJM is just that much more important.)

In addition to complications in making comparisons, there are problems with JIFs (besides the obvious limitation that citations are only one possible evaluation metric). They depend on what journals and articles are in the database being used. And they mostly measure short-term impact. Most important for my purposes here, however, is that they are often misused to judge the importance of articles rather than journals. That is, if you are a librarian deciding what journal to subscribe to, JIF is a useful way of knowing which journals your users might want to access. But if you are evaluating a scholar’s research, knowing that they published in a high-JIF journal does not mean that their article will turn out to be important. It is especially wrong to look at an article that’s old enough to have citations you could count (or not) and judge its quality by the journal it’s published in — but people do that all the time.

To illustrate this, I gathered citation data from the almost 2,500 articles published in 2016-2019 in 15 sociology journals from the Web of Science category list.* In JIF these rank from #2 (American Sociological Review, 6.37) to #46 (Social Forces, 1.95). I chose these to represent a range of impact factors, and because they are either generalist journals (e.g., ASR, Sociological Science, Social Forces) or sociology-focused enough that almost any article they publish could have been published in a generalist journal as well. Here is a figure showing the distribution of citations to those articles as of December 2020, by journal, ordered from higher to lower JIF.

After ASR, Sociology of Education, and American Journal of Sociology, it’s hard to see much of a slope here. Outliers might be playing a big role (for example that very popular article in Sociology of Religion, “Make America Christian Again: Christian Nationalism and Voting for Donald Trump in the 2016 Presidential Election,” by Whitehead, Perry, and Baker in 2018). But there’s a more subtle problem, which is the timing of the measures. My collection of articles is 2016-2019. The JIFs I’m using are from 2019, based on citations to 2017-2018 articles. These journals bounce around; for example, Sociology of Religion jumped from 1.6 to 2.6 in 2019. (I address that issue in the supplemental analysis below.) So what is a lazy promotion and tenure committee, which is probably working off a mental reputation map at least a dozen years old, to do?

You can already tell where I’m going with this: In these sociology journals, there is so much noise in citation rates within the journals, compared to any stable difference between them, that outside the very top the journal ranking won’t much help you predict how much a given paper will be cited. If you assume a paper published in AJS will be more important than one published in Social Forces, you might be right, but if the odds that you’re wrong are too high, you just shouldn’t assume anything. Let’s look closer.

Sociology failure rates

I recently read this cool paper (also paywalled in the Journal of Informetrics) that estimates the odds of this “failure probability,” the odds that your guess about which paper will be more impactful based on the journal title turns out to be wrong. When JIFs are similar, the odds of an error are very high, like a coin flip. “In two journals whose JIFs are ten-fold different, the failure probability is low,” Brito and Rodríguez-Navarro conclude. “However, in most cases when two papers are compared, the JIFs of the journals are not so different. Then, the failure probability can be close to 0.5, which is equivalent to evaluating by coin flipping.”

Their formulas look pretty complicated to me, so for my sociology approach I just did it by brute force (or if you need tenure you could call it a Monte Carlo approach). I randomly sampled 100,000 times from each possible pair of journals, then calculated the percentage of times the article with more citations was from a journal with a higher impact factor. For example, in 100,000 comparisons of random pairs sampled from ASR and Social Forces (the two journals with the biggest JIF spread), 73% of the time the ASR article had more citations.

Is 73% a lot? It’s better than a coin toss, but I’d hate to have a promotion or hiring decision be influenced by an instrument that blunt. Here are results of the 10.5 million comparisons I made (I love computers). Click to enlarge:

Outside of the ASR column, these are very bad; in the ASR column they’re pretty bad. For example, a random article from AJS only has more citations than one from the 12 lower-JIF journals 59% of the time. So if you’re reading CVs, and you see one candidate with a two-year old AJS article and one with a two-year-old Work & Occupations article, what are you supposed to do? You could compare the actual citations the two articles have gotten, or you could assess their quality of impact some other way. You absolutely should not just skim the CV and assume the AJS article is or will be more influential based on the journal title alone; the failure probability of that assumption is too high.

On my table you can also see some anomalies, of the kind which plague this system. See all that brown in the BJS and Sociology of Religion columns? That’s because both of those journals had sudden increases in their JIF, so their more recent articles have more citations, and most of the comparisons in this table (like in your memory, probably) are based on data from a few years before that. People who published in these journals three years ago are today getting an undeserved JIF bounce from having these titles on their CVs. (See the supplemental analysis below for more on this.)


Using JIF to decide which papers in different sociology journals are likely to be more impactful is a bad idea. Of course, lots of people know JIF is imperfect, but they can’t help themselves when evaluating CVs for hiring or promotion. And when you show them evidence like this, they might say “but what is the alternative?” But as Brito & Rodríguez-Navarro write: “if something were wrong, misleading, and inequitable the lack of an alternative is not a cause for continuing using it.” These error rates are unacceptably high.

In sociology most people won’t own up to relying on impact factors, but most people (in my experience) do judge research by where it’s published all the time. If there is a very big difference in status — enough to be associated with an appreciably different acceptance rate, for example — that’s not always wrong. But it’s a bad default.

In 2015 the biologist Michael Eisen suggested that tenured faculty should remove the journal titles from their CVs and websites, and just give readers the title of the paper and a link to it. He’s done it for his lab’s website, and I urge you to look at it just to experience the weightlessness of an academic space where for a moment overt prestige and status markers aren’t telling you what to think. I don’t know how many people have taken him up on it. I did it for my website, with the explanation, “I’ve left the titles off the journals here, to prevent biasing your evaluation of the work before you read it.” Whatever status I’ve lost I’ve made up for in virtue-signaling self-satisfaction — try it! (You can still get the titles from my CV, because I feel like that’s part of the record somehow.)

Finally, I hope sociologists will become more sociological in their evaluation of research — and of the systems that disseminate, categorize, rank, and profit from it.

Supplemental analysis

The analysis thus far is, in my view, a damning indictment of real-world reliance on the Journal Impact Factor for judging articles, and thus the researchers who produce them. However, it conflates two problems with the JIF. First is the statistical problem of imputing status from an aggregate to an individual, when the aggregate measure fails to capture variation that is very wide relative to the difference between groups. Second, more specific to JIF, is the reliance on a very time-specific comparison: citations in year three to publications in years one and two. Someone could do (maybe already has) an analysis to determine the best lag structure for JIF to maximize its predictive power, but the conclusions from the first problem imply that’s a fool’s errand.

Anyway, in my sample the second problem is clearly relevant. My analysis relies strictly on the rank-ordering provided by the JIF to determine whether article comparisons succeed or fail. However, the sample I drew covers four years, 2016-2019, and counts citations to all of them through 2020. This difference in time window produces a rank ordering that differs substantially (the rank order correlation is .73), as you can see:

In particular, three journals (BJS, SOR, and SFO) moved more than five spots in the ranking. A glance at the results table above shows that these journals are dragging down the matching success rate. To pull these two problems apart, I repeated the analysis using the ranking produced within the sample itself.

The results are now much more straightforward. First, here is the same box plot but with the new ordering. Now you can see the ranking more clearly, though you still have to squint a little.

And in the match rate analysis, the result is now driven by differences in means and variances rather than by the mismatch between JIF and sample-mean rankings (click to enlarge):

This makes a more logical pattern. The most differentiated journal, ASR, has the highest success rate, and the journals closest together in the ranking fail the most. However, please don’t take from this that such a ranking becomes a legitimate way to judge articles. The overall average on this table is still only 58%, up only 4 points from the original table. Even with a ranking that more closely conforms to the sample, this confirms Brito and Rodríguez-Navarro’s conclusion: “[when rankings] of the journals are not so different … the failure probability can be close to 0.5, which is equivalent to evaluating by coin flipping.”

These match numbers are too low to responsibly use in such a way. These major sociology journals have citation rates that are too variable, and too similar at the mean, to be useful as a way to judge articles. ASR stands apart, but only because of the rest of the field. Even judging an ASR paper against its lower-ranked competitors produces a successful one-to-one ranking of papers just 72% of the time — and that only rises to 82% with the least-cited journal on the list.

The supplemental analysis is helpful for differentiating the multiple problems with JIF, but it does nothing to solve the problem of using journal citation rates to evaluate individual articles.

*The data and Stata code I used is up here: This includes the lists of all articles in the 15 journals from 2016 to 2020 and their citation counts as of the other day (I excluded 2020 papers from the analysis, but they’re in the lists). I forgot to save the version of the 100k-case random file that I used to do this, so I guess that can never be perfectly replicated; but you can probably do it better anyway.

Framing social class with sample selection

A lot of qualitative sociology makes comparisons across social class categories. Many researchers build class into their research designs by selecting subjects using broad criteria, most often education level, income level, or occupation. Depending on the set of questions at hand, the class selection categories will vary, focusing on, for example, upbringing and socialization, access to resources, or occupational outlook.

In the absence of a substantive review, here are a few arbitrarily selected examplar books from my areas of research:

This post was inspired by the question Caitlyn Collins asked the other day on Twitter:

She followed up by saying, “Social class is nebulous, but precision here matters to make meaningful claims. What do we mean when we say we’re talking to poor, working class, middle class, wealthy folks? I’m looking for specific demographic questions, categories, scales sociologists use as screeners.” The thread generated a lot of good ideas.

Income, education, occupation

Screening people for research can be costly and time consuming, so you want to maximize simplicity as well as clarity. So here’s a way of looking at some common screening variables, and what you might get or lose by relying on them in different combinations. This uses the 2018 American Community Survey, provided by (Stata data file and code here).

  • I used income, education, and occupation to identify the status of individuals, and generated household class categories by the presence of absence of types of people in each. That means everyone in each household is in the same class category (a choice you might or might not want to make).
  • Income: Total household income divided by an equivalency scale (for cost of living). The scale counts each adult as 1 person, each child under 18 as .70, and then scales that count by ^.70. I divided the resulting distribution into thirds, so households are in the top, middle, or bottom third. Top third is what I called “middle/upper” class, bottom third is “lower class.”
  • Education: I use BA degree to identify households that have (middle/upper) or don’t (lower) a four-year college graduate present. This is 31% of adults.
  • Occupation: I used the 2018 ACS occupation codes, and coded people as middle/upper class if their codes was 10 to 3550, which are management, business, and financial occupations; computer, engineering, and science occupations; education, legal, community service, arts, and media occupations; and healthcare practitioners and technical occupations. It’s pretty close to what we used to call “managerial and professional” occupations. Together, these account for 37% of workers.

So each of these three variables identifies an upper/middle class status of about a third of people.

For lower class status, you can just reverse them. The except is income, which is in three categories. For that, I counted households as lower class if their household income was in the bottom third of the adjusted distribution. In the figures below, that means they’re neither middle/upper class nor lower class if they’re in the middle of the income distribution. This is easily adjusted.

Venn diagrams

You can make Venn diagrams in Stata using the pvenn2 add-on, which I naturally discovered after making these. If  you must know, made these by generating tables in Stata, downloading this free plotter app, entering the values manually, copying the resulting figures into Powerpoint and applying the text there, then printing them to PDF, and extracting the images from PDF using Photoshop. Not recommended workflow.

Here they are. I hope the visuals might help people think about for example, who they might get if they screened on just one of these variables, or how unusual someone is who has a high income or occupation but no BA, and so on. But draw your own conclusions (and feel free to modify the code and follow your own approach). Click to enlarge.

First middle/upper class:

Venn diagram of overlapping class definitions

Then lower class:

Venn diagram of overlapping class definitions.

I said draw your own conclusions, but please don’t draw the conclusion that I think this is the best way to define social class. That’s a whole different question. This is just about simply ways to select people to be research subjects. For other posts on social class, follow this tag, which includes this post about class self identification by income and race/ethnicity.

Data and code:

COVID-19 code, data, codebooks, figures

Every day for who knows how long I’ve tinkered with COVID-19 data and made graphs using Stata. Now I’ve condensed my tools down to several elements, updated daily, which I’m sharing:

  • A program that assembles the COVID death and case data, by date, at the county, state, and country level. To this I have added some population, income, and political variables. The program is here, along with the codebook it outputs.
  • The data file is here in Stata format and CSV format. It’s in long shape, so one record for each place on each date.
  • A Stata program that makes my favorite graphs right now (currently 24 per day). The Figures are stored here in PNG format.
  • The Stata scheme I use to make them look the how I like is here.

These files are linked to my laptop so they update automatically when I revise them. Yay, Open Science Framework, which is non-profit, open source, free to use, and deserves your support.

I hope someone finds these helpful, for teaching or exploring on their own. It’s all yours.

Here are a few figures from today’s runs (click to enlarge):

counties with any cases

deaths and GDP scatter

The arriving divorce decline

In “The Coming Divorce Decline” I showed the U.S. divorce rate falling from 2008 to 2017, and predicted that, because the married population was being stocked with increasingly non-divorce-prone marriages, the rate would continue to fall. After the first draft (based on 2016 data), divorce fell in 2017, providing the first support for my prediction before the paper was even “published” (accepted for Socius). Now the 2018 data is out, and divorce has become less common still.

Here’s a quick update.

Based on the number of divorces reported in the survey each year, by sex, and the number of married people, I calculate the refined divorce rate, or the number of divorces per 1,000 married people. That fell another 3% for both women and men in 2018, to 15.9 and 14.3 respectively (the rates differ because these are self reports and women report more).


When I run the model from the paper again on the new data (on women only), I can show the drop in the adjusted odds of divorce, updating Figure 1 of the paper (the 2018 change in an unadjusted model is significant at p=.06; adjusted is p=.14, the adjusted change from 2016 is significant at p=.002).


For other takes on the latest data, see this report on the marriage-divorce ratio from Valerie Schweizer, and this on geographic variation from Colette Allred, both at the National Center for Family and Marriage Research.

  • The data and code for the paper are available here. This update uses the same code with one new year of data.
  • If you like my new Stata figure scheme (modified from Gray Kimbrough’s Uncluttered) you’re welcome to it: here.
  • Slides from my presentation this fall at the European Divorce Conference are here.
  • Divorce posts are gathered under this tag.

Why we need open science in demography, and how we can make it happen

“Why we need open science in demography, and how we can make it happen” is the title of a talk I gave at the Max Planck Institute for Demographic Research yesterday, as part of an open science workshop they hosted in Rostock, Germany. (The talk was not nearly as definitive as the title.)

The other (excellent) keynote was by Monica Alexander. I posted the slides from my talk here. There should be a video available later. The organizing committee for the event is working to raise the prominence of open science discussions at the Institute, and consider practices and policies they might adopt. We had a great meeting.

As an aside, I also got to hear an excellent tutorial by E. F. Haghish, who has written Markdoc, a “literate programming” (markdown) package for Stata, which is very cool. These are his slides.

rostock talk 2rostock group shot

That thing where you have a lot of little graphs (single-parent edition)

Yesterday I was on an author-meets-critics panel for The Triple Bind of Single-Parent Families: Resources, Employment, and Policies to Improve Well-Being, a new collection edited by Rense Nieuwenhuis and Laurie Moldonado. The book is excellent — and it’s available free under Creative Commons license.

Most of the chapters are comparative, with data from multiple countries. I like looking at the figures, especially the ones like this, which give a quick general sense and let you see anomalies and outliers. I made a couple, too, which I share below, with code.


Here’s an example, showing the proportion of new births to mothers who aren’t married, by education, for U.S. states.  For this I used the 2012-2016 combined American Community Survey file, which I got from I created an sample extract that included only women who reported having a child in the previous year, which gives me about 177,000 cases over the five years. The only other variables are state, education, and marital status. I put the raw data file on the Open Science Framework here. Code below.

My first attempt was bar graphs for each state. This is easiest because Stata lets you do graph means with the bar command (click to enlarge).

marst fertyr educ by state

The code for this is very simple. I made a dummy variable for single, so the mean of that is the proportion single. Edcat is a four-category education variable.

gr bar (mean) single [weight=perwt], over(edcat) bar(1,color(green)) yti(“Proportion not married”) by(state)

The bar graph is easy, and good for scanning the data for weird cases or interesting stories. But maybe it isn’t ideal for presentation, because the bars run from one state to the next. Maybe little lines would be better. This takes another step, because it requires making the graph with twoway, which doesn’t want to calculate means on the fly. So I do a collapse to shrink the dataset down to just means of single by state and edcat.

collapse (mean) single psingle=single [fw=perwt], by(state edcat)

Then I use a scatter graph, with line connectors between the dots. I like this better:

marst fertyr educ by state lines

You can see the overall levels (e.g., high in DC, low in Utah) as well as the different slopes (flatter in New York, steeper in South Dakota), and it’s still clear that the single-mother incidence is lowest in every state for women with BA degrees.

Here’s the code for that graph. Note the weights are now baked into the means so I don’t need them in the graph command. And to add the labels to the scatter plot you have to specify you want that. Still very simple:

gr twoway scatter single edcat , xlab(1 2 3 4, valuelabel) yti(“Proportion not married”) lcolor(green) msymbol(O) connect(l) by(state)

Sadly, I can’t figure out how to put one title and footnote on the graph, rather than a tiny title and footnote on every state graph, so I left titles out of the code and I then added them by hand in the graph editor. Boo.

Here’s the full code:

set more off

quietly infix ///
 byte statefip 1-2 ///
 double perwt 3-12 ///
 byte marst 13-13 ///
 byte fertyr 14-14 ///
 byte educ 15-16 ///
 int educd 17-19 ///
 using "[PATHNAME]\usa_00366.dat"

/* the sample is all women who reported having a child in the previous year, FERTYR==2 */
replace perwt = perwt / 100

format perwt %10.2f

label var statefip "State (FIPS code)"
label var perwt "Person weight"
label var marst "Marital status"
label var educd "Educational attainment [detailed version]"

label define statefip_lbl 01 "Alabama"
label define statefip_lbl 02 "Alaska", add
label define statefip_lbl 04 "Arizona", add
label define statefip_lbl 05 "Arkansas", add
label define statefip_lbl 06 "California", add
label define statefip_lbl 08 "Colorado", add
label define statefip_lbl 09 "Connecticut", add
label define statefip_lbl 10 "Delaware", add
label define statefip_lbl 11 "District of Columbia", add
label define statefip_lbl 12 "Florida", add
label define statefip_lbl 13 "Georgia", add
label define statefip_lbl 15 "Hawaii", add
label define statefip_lbl 16 "Idaho", add
label define statefip_lbl 17 "Illinois", add
label define statefip_lbl 18 "Indiana", add
label define statefip_lbl 19 "Iowa", add
label define statefip_lbl 20 "Kansas", add
label define statefip_lbl 21 "Kentucky", add
label define statefip_lbl 22 "Louisiana", add
label define statefip_lbl 23 "Maine", add
label define statefip_lbl 24 "Maryland", add
label define statefip_lbl 25 "Massachusetts", add
label define statefip_lbl 26 "Michigan", add
label define statefip_lbl 27 "Minnesota", add
label define statefip_lbl 28 "Mississippi", add
label define statefip_lbl 29 "Missouri", add
label define statefip_lbl 30 "Montana", add
label define statefip_lbl 31 "Nebraska", add
label define statefip_lbl 32 "Nevada", add
label define statefip_lbl 33 "New Hampshire", add
label define statefip_lbl 34 "New Jersey", add
label define statefip_lbl 35 "New Mexico", add
label define statefip_lbl 36 "New York", add
label define statefip_lbl 37 "North Carolina", add
label define statefip_lbl 38 "North Dakota", add
label define statefip_lbl 39 "Ohio", add
label define statefip_lbl 40 "Oklahoma", add
label define statefip_lbl 41 "Oregon", add
label define statefip_lbl 42 "Pennsylvania", add
label define statefip_lbl 44 "Rhode Island", add
label define statefip_lbl 45 "South Carolina", add
label define statefip_lbl 46 "South Dakota", add
label define statefip_lbl 47 "Tennessee", add
label define statefip_lbl 48 "Texas", add
label define statefip_lbl 49 "Utah", add
label define statefip_lbl 50 "Vermont", add
label define statefip_lbl 51 "Virginia", add
label define statefip_lbl 53 "Washington", add
label define statefip_lbl 54 "West Virginia", add
label define statefip_lbl 55 "Wisconsin", add
label define statefip_lbl 56 "Wyoming", add
label define statefip_lbl 61 "Maine-New Hampshire-Vermont", add
label define statefip_lbl 62 "Massachusetts-Rhode Island", add
label define statefip_lbl 63 "Minnesota-Iowa-Missouri-Kansas-Nebraska-S.Dakota-N.Dakota", add
label define statefip_lbl 64 "Maryland-Delaware", add
label define statefip_lbl 65 "Montana-Idaho-Wyoming", add
label define statefip_lbl 66 "Utah-Nevada", add
label define statefip_lbl 67 "Arizona-New Mexico", add
label define statefip_lbl 68 "Alaska-Hawaii", add
label define statefip_lbl 72 "Puerto Rico", add
label define statefip_lbl 97 "Military/Mil. Reservation", add
label define statefip_lbl 99 "State not identified", add
label values statefip statefip_lbl

label define educd_lbl 000 "N/A or no schooling"
label define educd_lbl 001 "N/A", add
label define educd_lbl 002 "No schooling completed", add
label define educd_lbl 010 "Nursery school to grade 4", add
label define educd_lbl 011 "Nursery school, preschool", add
label define educd_lbl 012 "Kindergarten", add
label define educd_lbl 013 "Grade 1, 2, 3, or 4", add
label define educd_lbl 014 "Grade 1", add
label define educd_lbl 015 "Grade 2", add
label define educd_lbl 016 "Grade 3", add
label define educd_lbl 017 "Grade 4", add
label define educd_lbl 020 "Grade 5, 6, 7, or 8", add
label define educd_lbl 021 "Grade 5 or 6", add
label define educd_lbl 022 "Grade 5", add
label define educd_lbl 023 "Grade 6", add
label define educd_lbl 024 "Grade 7 or 8", add
label define educd_lbl 025 "Grade 7", add
label define educd_lbl 026 "Grade 8", add
label define educd_lbl 030 "Grade 9", add
label define educd_lbl 040 "Grade 10", add
label define educd_lbl 050 "Grade 11", add
label define educd_lbl 060 "Grade 12", add
label define educd_lbl 061 "12th grade, no diploma", add
label define educd_lbl 062 "High school graduate or GED", add
label define educd_lbl 063 "Regular high school diploma", add
label define educd_lbl 064 "GED or alternative credential", add
label define educd_lbl 065 "Some college, but less than 1 year", add
label define educd_lbl 070 "1 year of college", add
label define educd_lbl 071 "1 or more years of college credit, no degree", add
label define educd_lbl 080 "2 years of college", add
label define educd_lbl 081 "Associates degree, type not specified", add
label define educd_lbl 082 "Associates degree, occupational program", add
label define educd_lbl 083 "Associates degree, academic program", add
label define educd_lbl 090 "3 years of college", add
label define educd_lbl 100 "4 years of college", add
label define educd_lbl 101 "Bachelors degree", add
label define educd_lbl 110 "5+ years of college", add
label define educd_lbl 111 "6 years of college (6+ in 1960-1970)", add
label define educd_lbl 112 "7 years of college", add
label define educd_lbl 113 "8+ years of college", add
label define educd_lbl 114 "Masters degree", add
label define educd_lbl 115 "Professional degree beyond a bachelors degree", add
label define educd_lbl 116 "Doctoral degree", add
label define educd_lbl 999 "Missing", add
label values educd educd_lbl

recode educd (0/61=1) (62/64=2) (65/90=3) (101/116=4), gen(edcat)

label define edlbl 1 "<HS"
label define edlbl 2 "HS", add
label define edlbl 3 "SC", add
label define edlbl 4 "BA+", add
label values edcat edlbl

label define marst_lbl 1 "Married, spouse present"
label define marst_lbl 2 "Married, spouse absent", add
label define marst_lbl 3 "Separated", add
label define marst_lbl 4 "Divorced", add
label define marst_lbl 5 "Widowed", add
label define marst_lbl 6 "Never married/single", add
label values marst marst_lbl

gen married = marst==1 /* this is married spouse present */
gen single=marst>3 /* this is divorced, widowed, and never married */

gr bar (mean) single [weight=perwt], over(edcat) bar(1,color(green)) yti("Proportion not married") by(state)

collapse (mean) single psingle=single [fw=perwt], by(state edcat)

gr twoway scatter single edcat , xlab(1 2 3 4, valuelabel) yti("Proportion not married") lcolor(green) msymbol(O) connect(l) by(state)



Donald is not the biggest loser (among winning and losing names)

From 2015 to 2016 there was a 10% drop in U.S. boys given the name Donald at birth, from 690 to 621, plunging the name from 900th to 986th in the overall rankings. Here is the trend in Donalds born from 1880 to 2016, shown on a log scale, from the Social Security names database.


That 2016 drop is relatively big in percentage terms, but it’s been dropping an average of 6% per year since 1957 (it dropped 26% in the 8 years after the introduction of Donald Duck in 1934). I really wish it was a popular name so we could more easily see if the rise of Donald Trump is a factor in this. With so few new Donalds, and the name already trending downward, there’s no way to tell if Trump fanatics may be counterbalancing regular people turned off to the name.

Stability over change

How big is a fall of 69 births, which seems so trivial in relation to the 3.9 million children born last year? Among names with more than 5 births in each year, only 499 fell more, compared with 26,052 that fell less or rose. So Donald is definitely a loser.

But I am always amazed at how little change there is in most names from year to year. It sounds obvious to describe a trend as rising or falling, but names are scarily regular in their annual changes given that the statistics from one year to the next reflect independent decisions by separate people who overwhelmingly don’t know each other.

Here is away of visualizing the change in the number of babies given each name, from 2015 to 2016. There is one dot for each name. Those below the diagonal had a decrease in births, those above had an increase; the closer to the line the less change there was. (To adjust for the 1% drop in total births, these are shown as births per 1,000 total born.)

2015-2016 count change

No name had a change of more than 1700 births this year (Logan dropped 1697, a drop of 13%; Adeline increased 1700, or 71%). There just isn’t much movement. I find that remarkable. (Among top names, James stands out this year: 14,773 born in 2015, rising by 3 to 14,776 in 2016.)

Here’s a look at the top right corner of that figure, just showing names with 3 per 1,000 or more births in either 2015 or 2016:

2015-2016 count change 3per1000

Note that most of these top names became less popular in 2016 (below the diagonal). That fits the long-term trend, well known by now, for names to become less popular over time, which means name diversity is increasing. I described that in the history chapter of my textbook, The Family; and going back to this old blog post from 2011. (This great piece by Tristan Bridges explores why there is more diversity among female names, as you can see by the fact that they are outnumbered among the top names shown here.)

Anyway, since I did it, here are the top 20 winners and losers, in numerical terms, in 2016. Wow, look at that catastrophic 21% drop in girls given the name Alexa (thanks, Amazon). I don’t know what’s up with Brandon and Blake. Your explanations will be as good as mine for these.



For the whole series of name posts on this blog, follow the names tag, including a bunch on the name Mary

Here’s the Stata code I used (not including the long-term Donald trend), including the figure and tables. The dataset is in a zip file at Social Security, here. There is a separate file for each year. The code below runs on the two latest files: yob2015.txt and yob2016.txt.

import delimited [path]\yob2016.txt
sort v2 v1
rename v3 count16
save "[path]\n16.dta", replace
import delimited [path]\yob2015.txt
sort v2 v1
rename v3 count15
merge 1:1 v2 v1 using [path]\n16.dta
drop _merge

gen pctchg = 100*(count16-count15)/count15
drop if pctchg==. /* drops cases that don't appear in both years (5+ names) */

gen countchg = count16-count15
rename v2 sex
rename v1 name

gsort -count16
gen rank16 = _n

gsort -count15
gen rank15 = _n

gsort -countchg
gen riserank=_n

gsort countchg
gen fallrank=_n

gen rankchg = rank15-rank16

format pctchg %9.1f 
format count15 count16 countchg %15.0fc

gen prop15 = (count15/3978497)*1000 /* these are births per 1000, based on NCHS birth report for 15 & 16 */
gen prop16 = (count16/3941109)*1000

*winners table
sort riserank
list sex name count15 count16 countchg pctchg rank15 rank16 rankchg in 1/20, sep(0)

*losers table
sort fallrank
list sex name count15 count16 countchg pctchg rank15 rank16 rankchg in 1/20, sep(0)

*figure for all names
twoway (scatter prop16 prop15 if sex=="M", mc(blue) m(Oh) mlw(vvthin)) (scatter prop16 prop15 if sex=="F" , m(Oh) mc(pink) mlw(vvthin))

*figure for top names
twoway (scatter prop16 prop15 if sex=="M" & (prop15>=3 | prop16>=3), ml(name) ms(i) mlabp(0)) (scatter prop16 prop15 if sex=="F" & (prop15>=3 | prop16>=3), ml(name) ms(i) mlabp(0))

Marriage and gender inequality in 124 countries

Countries with higher levels of marriage have higher levels of gender inequality. This isn’t a major discovery, but I don’t remember seeing this illustrated before, so I decided to do it. Plus I’m trying to improve my Stata graphing.

I used data from this U.N. report on marriage rates from 2008, restricted to those countries that had data from 2000 or later. To show marriage rates I used the percentage of women ages 30-34 that are currently married. This is thus a combination of marriage prevalence and marriage timing, which is something like the amount of marriage in the country. I got gender inequality from the U.N. Development Programme’s Human Development Report for 2015. The gender inequality index combines the maternal mortality ratio, the adolescent birth rate, the representation of women in the national parliament, the gender gap in secondary education, and the gender gap in labor market participation.

Here is the result. I labeled countries with 49 million population or more in red; a few interesting outliers are also labeled. The line is quadratic, unweighted for population (click to enlarge).

You can see the USA sliding right down that curve toward gender nirvana (not that I’m making a simplistic causal argument).

Note that India and China together are about 36% of the world’s population. They both have nearly universal marriage by age 30-34, but women in China get married about four years later on average. That’s an important part of why China has lower gender inequality (it goes along with more educational access, higher employment levels, politics, history, etc.). China is a major outlier among universal-marriage countries, while India is right on the curve.

Any cross-national comparison has to handle this issue. China is 139-times bigger than Sweden. One way to address it is to weight the points by their relative population sizes. If you do that it actually doesn’t change the result much, except for China, which in this cases changes everything because in addition to being huge they broke the relationship between marriage and gender inequality. Here is the comparison. Now the dots are scaled for population, and the gray line is fit to all the countries except China, while the red line includes China (click to enlarge).

My conclusion is that the gray line is the basic story — more marriage, more gender inequality — with China as an important exception, but that’s up for interpretation.

I put the data and the code for making the charts in this directory. Feel free to copy and crib, etc.

Stop me before I fake again

In light of the news on social science fraud, I thought it was a good time to report on an experiment I did. I realize my results are startling, and I welcome the bright light of scrutiny that such findings might now attract.

The following information is fake.

An employee training program in a major city promises basic job skills and as well as job search assistance for people with a high school degree and no further education, ages 23-52 in 2012. Due to an unusual staffing practice, new applications were for a period in 2012 allocated at random to one of two caseworkers. One provided the basic services promised but nothing extra. The other embellished his services with extensive coaching on such “soft skills” as “mainstream” speech patterns, appropriate dress for the workplace, and a hard work ethic, among other elements. The program surveyed the participants in 2014 to see what their earnings were in the previous 12 months. The data provided to me does not include any information on response rates, or any information about those who did not respond. And it only includes participants who were employed at least part-time in 2014. Fortunately, the program also recorded which staff member each participant was assigned to.

Since this provides such an excellent opportunity for studying the effects of soft skills training, I think it’s worth publishing despite these obvious weaknesses. To help with the data collection and analysis, I got a grant from Big Neoliberal, a non-partisan foundation.

The data includes 1040 participants, 500 of whom had the bare-bones service and 540 of whom had the soft-skills add-on, which I refer to as the “treatment.” These are the descriptive statistics:


As you can see, the treatment group had higher earnings in 2014. The difference in logged annual earnings between the two groups is significant at p


As you can see in Model 1, the Black workers in 2014 earned significantly less than the White workers. This gap of .15 logged earnings points, or about 15%, is consistent with previous research on the race wage gap among high school graduates. Model 2 shows that the treatment training apparently was effective, raising earnings about 11%. However, The interactions in Model 3 confirm that the benefits of the treatment were concentrated among the Black workers. The non-Black workers did not receive a significant benefit, and the treatment effect among Black workers basically wiped out the race gap.

The effects are illustrated, with predicted probabilities, in this figure:


Soft skills are awesome.

I have put the data file, in Stata format, here.


What would you do if you saw this in a paper or at a conference? Would you suspect it was fake? Why or why not?

I confess I never seriously thought of faking a research study before. In my day coming up in sociology, people didn’t share code and datasets much (it was never compulsory). I always figured if someone was faking they were just changing the numbers on their tables to look better. I assumed this happens to some unknown, and unknowable, extent.

So when I heard about the Lacour & Green scandal, I thought whoever did it was tremendously clever. But when I looked into it more, I thought it was not such rocket science. So I gave it a try.


I downloaded a sample of adults 25-54 from the 2014 ACS via IPUMS, with annual earnings, education, age, sex, race and Hispanic origin. I set the sample parameters to meet the conditions above, and then I applied the treatment, like this:

First, I randomly selected the treatment group:

gen temp = runiform()
gen treatment=0
replace treatment = 1 if temp >= .5
drop temp

Then I generated the basic effect, and the Black interaction effect:

gen effect = rnormal(.08,.05)
gen beffect = rnormal(.15,.05)

Starting with the logged wage variable, lnwage, I added the basic effect to all the treated subjects:

replace newlnwage = lnwage+effect if treatment==1

Then added the Black interaction effect to the treated Black subjects, and subtracted it from the non-treated ones.

replace newlnwage = newlnwage+beffect if (treatment==1 & black==1)
replace newlnwage = newlnwage-beffect if (treatment==0 & black==1)

This isn’t ideal, but when I just added the effect I didn’t have a significant Black deficit in the baseline model, so that seemed fishy.

That’s it. I spent about 20 minutes trying different parameters for the fake effects, trying to get them to seem reasonable. The whole thing took about an hour (not counting the write-up).

I put the complete fake files here: code, data.

Would I get caught for this? What are we going to do about this?


In the comments, ssgrad notices that if you exponentiate (unlog) the incomes, you get a funny list — some are binned at whole numbers, as you would expect from a survey of incomes, and some are random-looking and go out to multiple decimal places. For example, one person reports an even $25,000, and another supposedly reports $25251.37. This wouldn’t show up in the descriptive statistics, but is kind of obvious in a list. Here is a list of people with incomes between $20000 and $26000, broken down by race and treatment status. I rounded to whole numbers because even without the decimal points you can see that the only people who report normal incomes are non-Blacks in the non-treatment group. Busted!

fake-busted-tableSo, that only took a day — with a crowd-sourced team of thousands of social scientists poring over the replication file. Faith in the system restored?