Stop me before I fake again

In light of the news on social science fraud, I thought it was a good time to report on an experiment I did. I realize my results are startling, and I welcome the bright light of scrutiny that such findings might now attract.

The following information is fake.

An employee training program in a major city promises basic job skills and as well as job search assistance for people with a high school degree and no further education, ages 23-52 in 2012. Due to an unusual staffing practice, new applications were for a period in 2012 allocated at random to one of two caseworkers. One provided the basic services promised but nothing extra. The other embellished his services with extensive coaching on such “soft skills” as “mainstream” speech patterns, appropriate dress for the workplace, and a hard work ethic, among other elements. The program surveyed the participants in 2014 to see what their earnings were in the previous 12 months. The data provided to me does not include any information on response rates, or any information about those who did not respond. And it only includes participants who were employed at least part-time in 2014. Fortunately, the program also recorded which staff member each participant was assigned to.

Since this provides such an excellent opportunity for studying the effects of soft skills training, I think it’s worth publishing despite these obvious weaknesses. To help with the data collection and analysis, I got a grant from Big Neoliberal, a non-partisan foundation.

The data includes 1040 participants, 500 of whom had the bare-bones service and 540 of whom had the soft-skills add-on, which I refer to as the “treatment.” These are the descriptive statistics:


As you can see, the treatment group had higher earnings in 2014. The difference in logged annual earnings between the two groups is significant at p


As you can see in Model 1, the Black workers in 2014 earned significantly less than the White workers. This gap of .15 logged earnings points, or about 15%, is consistent with previous research on the race wage gap among high school graduates. Model 2 shows that the treatment training apparently was effective, raising earnings about 11%. However, The interactions in Model 3 confirm that the benefits of the treatment were concentrated among the Black workers. The non-Black workers did not receive a significant benefit, and the treatment effect among Black workers basically wiped out the race gap.

The effects are illustrated, with predicted probabilities, in this figure:


Soft skills are awesome.

I have put the data file, in Stata format, here.


What would you do if you saw this in a paper or at a conference? Would you suspect it was fake? Why or why not?

I confess I never seriously thought of faking a research study before. In my day coming up in sociology, people didn’t share code and datasets much (it was never compulsory). I always figured if someone was faking they were just changing the numbers on their tables to look better. I assumed this happens to some unknown, and unknowable, extent.

So when I heard about the Lacour & Green scandal, I thought whoever did it was tremendously clever. But when I looked into it more, I thought it was not such rocket science. So I gave it a try.


I downloaded a sample of adults 25-54 from the 2014 ACS via IPUMS, with annual earnings, education, age, sex, race and Hispanic origin. I set the sample parameters to meet the conditions above, and then I applied the treatment, like this:

First, I randomly selected the treatment group:

gen temp = runiform()
gen treatment=0
replace treatment = 1 if temp >= .5
drop temp

Then I generated the basic effect, and the Black interaction effect:

gen effect = rnormal(.08,.05)
gen beffect = rnormal(.15,.05)

Starting with the logged wage variable, lnwage, I added the basic effect to all the treated subjects:

replace newlnwage = lnwage+effect if treatment==1

Then added the Black interaction effect to the treated Black subjects, and subtracted it from the non-treated ones.

replace newlnwage = newlnwage+beffect if (treatment==1 & black==1)
replace newlnwage = newlnwage-beffect if (treatment==0 & black==1)

This isn’t ideal, but when I just added the effect I didn’t have a significant Black deficit in the baseline model, so that seemed fishy.

That’s it. I spent about 20 minutes trying different parameters for the fake effects, trying to get them to seem reasonable. The whole thing took about an hour (not counting the write-up).

I put the complete fake files here: code, data.

Would I get caught for this? What are we going to do about this?


In the comments, ssgrad notices that if you exponentiate (unlog) the incomes, you get a funny list — some are binned at whole numbers, as you would expect from a survey of incomes, and some are random-looking and go out to multiple decimal places. For example, one person reports an even $25,000, and another supposedly reports $25251.37. This wouldn’t show up in the descriptive statistics, but is kind of obvious in a list. Here is a list of people with incomes between $20000 and $26000, broken down by race and treatment status. I rounded to whole numbers because even without the decimal points you can see that the only people who report normal incomes are non-Blacks in the non-treatment group. Busted!

fake-busted-tableSo, that only took a day — with a crowd-sourced team of thousands of social scientists poring over the replication file. Faith in the system restored?


My rejection of the National Marriage Project’s “Before ‘I Do'”

All day today, “The Decisive Marriage” has topped the New York Times most-emailed list. The piece is a Well Blog post, written by Tara Parker-Pope, which reports on a report published by the National Marriage Project and written by Galena Rhoades and Scott Stanley, “Before ‘I Do’: What Do Premarital Experiences Have to Do with Marital Quality Among Today’s Young Adults?”

I have frequently criticized the National Marriage Project, run by Bradford Wilcox (posts listed under this tag), and I ignore their work when I can. But this report is getting a lot of attention now and several people have asked my opinion. Since the research in the report has not been subject to peer review, and the Pope piece does not include any expert commentary from non-authors, I figured I’d structure this post like the peer review report I would dash off if I had been asked to review the piece (it’s a little different because I have access to the author and funding information, and I wouldn’t include links or graphics, but this is more or less how it would go if I were asked to review it).

Before “I Do”

This paper reports results from an original data collection which sampled 1,294 people in 2007/08, and then followed an unknown number of them for five years. The present paper reports on the marriage quality of 418 of the individuals who reported marrying over the period (ages 18-40). The authors provide no information on sample attrition or how this was handled in the analysis, or the determinants of marriage within the sample. Although they claim (without evidence) that the sample was “reasonably representative of unmarried adults,” they note it is 65% female, so it’s obviously not representative. More importantly, the analysis sample is only those who married, which is highly select. Neither sexual orientation of the respondents, nor gender composition of the couples described is reported.

The outcome variable in the study is a reasonable measure of “marital quality” based on a four-item reduced-form version of the Dyadic Adjustment Scale (originally developed by Graham Spanier), which includes these items:

  • How often do you discuss or have you considered divorce, separation, or terminating your relationship?
  • In general, how often do you think that things between you and your partner are going well?
  • Do you confide in your mate?
  • Please circle the dot which best describes the degree of happiness, all things considered, of your relationship.

The authors provide no details on the coding of these items, but say the scale ranges from 0 to 21, and their sample included people who scored from 0 to 21. However, the mean was 16.5 and the standard deviation was 3.7, indicating a strong skew toward high scores. Inexplicably, for the presentation of results the authors dichotomize the dependent variable into those they classify as “higher quality,” the 40% of respondents who scored (19-21), versus everyone else (0-18). To defend this decision, the authors offer this non-explanation, which means exactly nothing:

This cut point was selected by inspection of the distribution. While it is somewhat arbitrary, we reasoned that these people are not just doing “above average” in their marriages, but are doing quite well.

The average marriage duration is not reported, but the maximum possible is 5 years, so we are talking about marriage quality very early in these marriages.

The main presentation of findings consists of bar graphs misleadingly labeled “Percent in Higher-Quality Marriages, by…” various independent variables. These are misleading because, according to the notes to these figures, “These percentages are adjusted for race/ethnicity, years of education, personal income, religiousness, and frequency of attendance at religious services.” Here is one:


The method for arriving at these “adjusted” percentages is not given. This apparently confused Parker-Pope, who reported them as unadjusted percentages, like this:

People who lived with another person before marrying also reported a lower-quality relationship. In that group, 35 percent had higher-quality marriages. Among those who had not lived with another romantic partner before marriage, 42 percent had higher-quality marriages.

The statistical significance of this difference is not reported. However, if this were a simple difference of proportions, the difference would not be statistically significant at conventional levels (with a sample of 418, 39% of whom lived with someone else before, the test for difference of proportions for .42 and .35 yields a z-score of 1.43, p=.15). The full report includes an appendix which says they used multilevel modeling, but the form of the regression is not specified. The regression table provided includes no fit statistics or variance components so the efficacy of the model cannot be evaluated.

Regression says: Adding 100 people to the wedding party 5 times would not equal the effect on marital quality of not being Black.

Much is made here (and in the Pope article about these findings) about the wedding-size effect. That is, among married couples, those who reported bigger weddings had higher average marriage quality. The mean wedding size was 117. In the regression model, each additional wedding guest was associated with an increase in marriage quality (on the 0-21 scale) of .005. That is, if this were a real effect, adding 100 wedding guests would increase marital quality by half a point, or less than 1/7 of a standard deviation. For comparison, in the model, the negative effect of being Black (-2.69) is more than 5-times greater than the effect of a 100-guest swing in wedding attendance. (The issue of effect size did not enter into Pope’s description of the results.)

The possibility of nonlinear effects of wedding size or other variables is not discussed.

Are the results plausible?

It is definitely possible that, for example, less complicated relationship histories, or larger weddings, do contribute to marital happiness early in the marriage. The authors speculate, based on psychological research from the 1970s, that the “desire for consistency” means “having more witnesses at a wedding may actually strengthen marital quality.”

Sure. The much bigger issue, however, is two kinds of selection. The first, which they address — very poorly — concerns spurious effects. Thus, the simplest explanation is that (holding income constant) people with larger weddings simply had better relationships to begin with. Or, because personal income (not couple income — and note only one person from each couple was interviewed) is at best a very noisy indicator of resources available to couples, big weddings may simply proxy for wealthier families.

Or, about the finding that living with someone else prior to the current relationship is associated with poorer marriage quality, it may simply be that people who have trouble in relationships are more likely to have both lived with someone else and have poor quality marriages later. Cherlin et al. have reported, for example, that women with a history of sexual abuse are more likely to be in transitory relationships, including serial cohabiting relationships, so a history of abuse could account for some of these results. And so on.

The authors address this philosophically, which is all they can do given their data:

One obvious objection to this study is that it may be capturing what social scientists call “selection effects” rather than a causal relationship between our independent variables and the outcome at hand. That is, this report’s results may reflect the fact that certain types of people are more likely to engage in certain behaviors—such as having a child prior to marriage—that are correlated with experiencing lower odds of marital quality. It could be that these underlying traits or experiences, rather than the behaviors we analyzed, explain the associations reported here. This objection applies to most research that is not based on randomized experiments. We cannot prove causal associations between the personal and couple factors we explore and marital quality.

However, because they have rudimentary demographic controls, and the independent variables chronologically precede the outcome variable, they think they’re on pretty firm ground:

With the help of our research, we hope current and future couples will better understand the factors that appear to contribute to building a healthy, loving marriage in contemporary America.

This is Wilcox’s standard way of nodding to selection before plowing ahead with unjustified conclusions. This is not a reasonable approach, for reasons apparent in today’s New York Times. Tara Parker-Pope does not mention this issue, and her piece will obviously reach many more people than the original report or this post.

They hope people will take their results as relationship advice. In Pope’s piece, Stanley offers exactly the same advice he always gives. If that is to be the case, the best advice by far — based on their models — is to avoid being Black, and to finish high school. Living with both one’s biological parents at age 14 helps, too. In relationship terms, unfortunately, most of the results could just as easily reflect wealth or initial relationship quality rather than relationship decisions, and thus tell us that people who have healthy (and less complicated) relationships before marriage have healthy relationships in the first few years after marriage.

Perhaps more serious, however, for this study design, is the second kind of selection: selection into the sample (by marriage). Anything that affects both the odds of marrying and the quality of marriage is potentially corrupting these results. This is a big, complicated issue, with a whole school of statistical methods attached to it. Unless they attend to that issue this analysis should not be published.

On the funding

The authors state the project was “initially funded” by the National Institute of Child Health and Human Development, but the report also acknowledges support from the William E. Simon Foundation, a very conservative foundation that in 2012 gave hundreds of thousands of dollars to the Witherspoon Institute (which funded the notorious Wilcox/Regnerus research on children of same-sex couples), the Heritage Foundation, the Hoover Institute, the Manhattan Institute, and other conservative and Christian activist organizations. Details on funding are not provided.

The National Marriage Project is well-known for publishing only work that supports their agenda of marriage promotion. Some of what they publish may be true, but based on their track record they cannot be trusted as honest brokers of new research.


