Tag Archives: replication

Sociology’s culture of trust, don’t verify

Replication in sociology is a disaster. There basically isn’t any. Accountability is something a select few people opt into; as a result, mostly people with nothing to hide ever have their work verified or replicated. Even when work is easily replicable, such as that using publicly available datasets, there is no common expectation that anyone will do it, and no support for doing it; basically no one funds or publishes replications.

Peer review is good, but it’s not about replicability, because it almost always relies on the competence and good faith of the authors. Reviewers might say, “This looks funny, did you try this or that?” But if the author says, “Yes, I did that,” that’s usually the end of it. Academic sociology, in short, runs on a system of trust. That’s worth exactly what it’s worth. It doesn’t have to be this way.

I thought of this today when I read the book excerpt by Mark Regnerus in the Wall Street Journal. (I haven’t read his new book, Cheap Sex yet, although I called the basic arguments a “big ball of wrong” three years ago when he first published them.) Regnerus opens that essay with a single quote supposedly from an anonymous 24-year-old recent college graduate that absolutely perfectly represents his thesis:

If you know what girls want, then you know you should not give that to them until the proper time. If you do that strategically, then you can really have anything you want…whether it’s a relationship, sex, or whatever. You have the control.

(Regnerus argues men have recently gained control over sex because women have stopped demanding marriage in exchange for it.)

Scholars and readers in sociology don’t normally question whether specific quotes in qualitative research are real or not. We argue over the interpretation, or elements of the research design that might call the interpretation into question (such as the method of selecting respondents or a field site). But if we simply don’t trust the author, what do we do? In the case of Regnerus, we know that he has lied, a lot, about important things related to his research. So how do you read his research in a discipline with no norm of verification or replicability, a discipline naively based on trust? The fake news era is here; we have to address this. Fortunately, every other social discipline already is, so we don’t have to reinvent the wheel.

Tackling it

Of course there are complicated issues with different kinds of sociology, especially qualitative work. It’s one of the things people wrestled with in the Contexts forum Syed Ali and I organized for the American Sociological Association on how to do ethnography right.

That forum took place in the wake of all the attention Alice Goffman received for her book, and article, On the Run (my posts on that are under this tag). One person who followed that controversy closely was law professor Steven Lubet, who has written a new book titled, “Interrogating Ethnography: Why Evidence Matters,” which addresses that situation in depth. The book comes out October 20, at a conference at Northwestern University’s law school. I will be one of a number of people commenting on the book and its implications.


I hope you can come to the event in Chicago.

Finally, regardless of your opinion on recent controversies in sociology, if you haven’t read it, I urge you to read (and, if you’re in such a position, require that your students read) “Replication in Social Science,” by Jeremy Freese and David Peterson, in the latest Annual Review of Sociology (SocArXiv preprint; journal version). Freese and Peterson refer to sociology as “the most undisciplined social science,” and they write:

As sociologists, the most striking thing in reviewing recent developments in social science replication is how much all our neighbors seem to be talking and doing about improving replicability. Reading economists, it is hard not to connect their relatively strict replication culture with their sense of importance: shouldn’t a field that has the ear of policy-makers do work that is available for critical inspection by others? The potential for a gloomy circle ensues, in which sociology would be more concerned with replication and transparency if it was more influential, but unwillingness to keep current on these issues prevents it from being more influential. In any case, the integrative and interdisciplinary ambitions of many sociologists are obviously hindered by the field’s inertness on these issues despite the growing sense in nearby disciplines that they are vital to ensuring research integrity.

That paper has some great ideas for easy reforms to start out with. But we need to get the conversation moving. In addition developing replication standards and norms, we need to get the next generation of sociologists some basic training in the (jargon alert!) political economy of scholarly communication and the publishing ecosystem. The individual incentives are weak, but the need for the discipline to act is very strong. If we can at least get sociologists to be vaguely aware of the attention to this issue generated in most other social science disciplines, it would be a great step forward.

Incidentally, Freese will also present on the topic of replication at the O3S: Open Scholarship for the Social Sciences symposium SocArXiv is hosting at the University of Maryland later this month; still time to register!

Leave a comment

Filed under In the news

Stop me before I fake again

In light of the news on social science fraud, I thought it was a good time to report on an experiment I did. I realize my results are startling, and I welcome the bright light of scrutiny that such findings might now attract.

The following information is fake.

An employee training program in a major city promises basic job skills and as well as job search assistance for people with a high school degree and no further education, ages 23-52 in 2012. Due to an unusual staffing practice, new applications were for a period in 2012 allocated at random to one of two caseworkers. One provided the basic services promised but nothing extra. The other embellished his services with extensive coaching on such “soft skills” as “mainstream” speech patterns, appropriate dress for the workplace, and a hard work ethic, among other elements. The program surveyed the participants in 2014 to see what their earnings were in the previous 12 months. The data provided to me does not include any information on response rates, or any information about those who did not respond. And it only includes participants who were employed at least part-time in 2014. Fortunately, the program also recorded which staff member each participant was assigned to.

Since this provides such an excellent opportunity for studying the effects of soft skills training, I think it’s worth publishing despite these obvious weaknesses. To help with the data collection and analysis, I got a grant from Big Neoliberal, a non-partisan foundation.

The data includes 1040 participants, 500 of whom had the bare-bones service and 540 of whom had the soft-skills add-on, which I refer to as the “treatment.” These are the descriptive statistics:


As you can see, the treatment group had higher earnings in 2014. The difference in logged annual earnings between the two groups is significant at p


As you can see in Model 1, the Black workers in 2014 earned significantly less than the White workers. This gap of .15 logged earnings points, or about 15%, is consistent with previous research on the race wage gap among high school graduates. Model 2 shows that the treatment training apparently was effective, raising earnings about 11%. However, The interactions in Model 3 confirm that the benefits of the treatment were concentrated among the Black workers. The non-Black workers did not receive a significant benefit, and the treatment effect among Black workers basically wiped out the race gap.

The effects are illustrated, with predicted probabilities, in this figure:


Soft skills are awesome.

I have put the data file, in Stata format, here.


What would you do if you saw this in a paper or at a conference? Would you suspect it was fake? Why or why not?

I confess I never seriously thought of faking a research study before. In my day coming up in sociology, people didn’t share code and datasets much (it was never compulsory). I always figured if someone was faking they were just changing the numbers on their tables to look better. I assumed this happens to some unknown, and unknowable, extent.

So when I heard about the Lacour & Green scandal, I thought whoever did it was tremendously clever. But when I looked into it more, I thought it was not such rocket science. So I gave it a try.


I downloaded a sample of adults 25-54 from the 2014 ACS via IPUMS, with annual earnings, education, age, sex, race and Hispanic origin. I set the sample parameters to meet the conditions above, and then I applied the treatment, like this:

First, I randomly selected the treatment group:

gen temp = runiform()
gen treatment=0
replace treatment = 1 if temp >= .5
drop temp

Then I generated the basic effect, and the Black interaction effect:

gen effect = rnormal(.08,.05)
gen beffect = rnormal(.15,.05)

Starting with the logged wage variable, lnwage, I added the basic effect to all the treated subjects:

replace newlnwage = lnwage+effect if treatment==1

Then added the Black interaction effect to the treated Black subjects, and subtracted it from the non-treated ones.

replace newlnwage = newlnwage+beffect if (treatment==1 & black==1)
replace newlnwage = newlnwage-beffect if (treatment==0 & black==1)

This isn’t ideal, but when I just added the effect I didn’t have a significant Black deficit in the baseline model, so that seemed fishy.

That’s it. I spent about 20 minutes trying different parameters for the fake effects, trying to get them to seem reasonable. The whole thing took about an hour (not counting the write-up).

I put the complete fake files here: code, data.

Would I get caught for this? What are we going to do about this?


In the comments, ssgrad notices that if you exponentiate (unlog) the incomes, you get a funny list — some are binned at whole numbers, as you would expect from a survey of incomes, and some are random-looking and go out to multiple decimal places. For example, one person reports an even $25,000, and another supposedly reports $25251.37. This wouldn’t show up in the descriptive statistics, but is kind of obvious in a list. Here is a list of people with incomes between $20000 and $26000, broken down by race and treatment status. I rounded to whole numbers because even without the decimal points you can see that the only people who report normal incomes are non-Blacks in the non-treatment group. Busted!

fake-busted-tableSo, that only took a day — with a crowd-sourced team of thousands of social scientists poring over the replication file. Faith in the system restored?


Filed under In the news, Research reports