The news is nothing I have to say, but the new article, available in prepublication form, by Simon Cheng and Brian Powell, which methodically flays the infamous Regnerus paper, leaving nothing but a wisp of foul-smelling ill-will trailing from its remains. (The paper is here, where it is paywalled; feel free to email me. Follow the whole story at the Regnerus tag.)
Cheng and Powell reanalyzed the Regnerus data, the New Family Structures Survey (NFSS), and see what would happen if Regnerus had done the data processing and analysis right. This goes beyond the logical flaws and biases that were inherent in the study design (discussed here), to find the coding and analysis errors. A few examples:
- So much for “raised by…” 24 of the 236 people coded as having a “lesbian mother” or “gay father” — because they reported one of their parents ever had a same-sex romantic relationship (I’ll use LM and GF here to refer to Regnerus’s codes, not reality) — never lived with the parent in question! We had known previously that a large number (138) had never lived with the partner in the romantic relationship, but this is a whole nother level of wrong. A total of 58 of the LM/GF sample were reported to have lived with the supposedly gay or lesbian parent for a single year or less.
- Bad cases. The most ridiculous is the “25 year-old man who reports that his father had a romantic relationship with another man, but also reports that he (the respondent) was 7-feet 8-inches tall, weighed 88 pounds, was married 8 times and had 8 children.” Another reported being arrested for the first time at age 1. Real data collectors scrutinize cases like that and throw them out or find a way to fix them. (Really good data collectors stop the person — or the data entry — right when they say something outrageous, to see if they’re sure.)
- Illogical cases. There are a lot of these, including the person who reported “having always lived alone but also claims to have always lived with mother, father, and two grandparents.”
Then there are a series of bad analysis and modeling decisions Regnerus made, such as coding people who refused to answer a question as 0 instead of missing, or using the wrong kind of statistical model for the particular outcome.
When they get done with it, there really is no reliable, significant negative outcome associated with having lived any appreciable amount of time with a parent who might have been gay or lesbian. There’s more to it, but I don’t want to discourage you from reading the paper.
Random error, correlated outcome
Some of the “misclassified or uncertain” cases also report serious problems in adulthood, exhibiting higher-than-average rates of suicidality, depression, drinking to get drunk, and having a poor relationship with their mothers. So those could be people whose difficult lives rendered them unable to complete the life history calendar correctly. But there is also a chance that, like the 7’8″ guy, there are people just answering some of the question at random. These were people taking the survey alone on a computer, with no supervision, and getting paid to be part of the sample. Clicking at random is not out of the question (one person only took 10 minutes to complete the lengthy survey).
Contrary to what you might assume, clicking at random does not always produce random results. I’ll illustrate this with an example. First, here’s another tidbit from Regnerus, which might fit this point. Speaking to some Franciscans in 2014, Regnerus (just after 9:00 of this video) was going on about sexual fluidity as a condition of modernity, when he dropped in this fact from the NFSS:
Despite comprising a mere 1.3 percent of the population, respondents in the NFSS [New Family Structures Survey] who said that their mothers have had a same-sex sexual relationship made up 15 [50?] percent of all the asexual identifiers in the NFSS. So, 15 [50?] percent of them come from 1.3 percent of the population. [I originally transcribed those as 50%, but on second listening I think he said 15%, but I can’t be sure.]
His raised eyebrow here is to indicate the deeply depraved nature of lesbian mothers — maybe it’s genetic, or maybe it’s child abuse — but… he lets the numbers speak for themselves. Lesbian mothers, asexual children.
Here’s how this works. If you are trying to find people in two rare conditions — for example, those with lesbian mothers and those who are asexual — and a small portion of your sample answers questions at random, not only will you have a relatively large number of false positives on your conditions, your rare conditions will also falsely appear to be correlated.
I’m sure I didn’t discover this, and I don’t have a mathematical proof for it, but it’s logical. And I confirmed it with an experiment, as follows.
Say you have a sample of 1000 people, and you’re studying two conditions that occur on average in one out of every 500 cases. I’ll call them “climbing Mt. Everest” and “going to the moon.” In your thousand cases, you will on average have 2 people who did each thing. The chances that the same person did both are probably really low (you do the maths). But, if just 1% of your sample — 10 people — answer those two yes/no questions at random, look out!
I created this scenario using Excel’s random-number function. With 990 people answering truthfully — that is, given a 1/500 chance of saying yes to each question — and 10 answering them both randomly, this is what I got: 6 people who had climbed Mt. Everest, and 8 people who had gone to the moon. But shockingly, there were 4 people who had done both — that is 67% of the mountain climbers and 50% of the moonshotters. You can’t know, from looking at the data, but I can, that all of the people who went on both adventures were in the tiny group of random answerers.
Here are the 1000 cases in random order, with green showing Everest-only cases, blue showing moon-only cases, and red showing positive answers to both questions. And here’s the statistic: in the total sample — 990 serious survey takers and 10 jokers — the correlation between climbing Mt. Everest and going to the moon is .53! Click to enlarge:
Maybe Regnerus is just an incredibly, irresponsibly bad researcher, who didn’t conduct the simplest data checks before rushing to publish his paper. Or maybe he is a diabolical genius, and he realized that high random error rates in both his rare independent variable and his rare dependent variables would produce results showing poor outcomes for children of gays and lesbians.
In the Cheng and Powell paper, their various procedures and corrections wipe out many of Regenerus’s negative outcomes for GF/LM respondents before they tackle the “misclassified or uncertain” cases. But when they do that, some of the last coefficients to fall to non-significance are indeed relatively rare: having suicidal thoughts (7%), not being “entirely heterosexual” (15%), having had an STI (11%), and having had forced sex (13%). Each of these becomes non-significant when the bad cases are controlled in the Cheng and Powell models. I haven’t worked out a proof (ever), but I reckon that the rarer they are, the more likely they are to be correlated with the rare independent variable (LM/GF) if some people are answering at random — which they apparently were.
Anyway, the Cheng and Powell paper speaks for itself. But I find it interesting that unchecked data error produces false positive (that is, negative) outcomes for marginal groups. Look out!