On artificially intelligent gaydar

A paper by Yilun Wang and Michal Kosinski reports being able to identify gay and lesbian people from photographs using “deep neural networks,” which means computer software.

I’m not going to describe it in detail here, but the gist of it is they picked a large sample of people from a dating website who said they were looking for same-sex partners, and an equal number that were looking for different-sex partners, and trained their computers to learn the facial features that could distinguish the two groups (including facial structure measurements as well as grooming things like hairline and facial hair). For a deep dive on the context of this kind of research and its implications, and more on the researchers and the controversy, please read this post by Greggor Mattson first. These notes will be most useful after you’ve read that.

I also reviewed a gaydar paper five years ago, and some of the same critiques apply.

This figure from the paper gives you an idea:


These notes are how I would start my peer review, if I was peer reviewing this paper (which is already accepted and forthcoming in the Journal of Personality and Social Psychology — so much for peer review [just kidding it’s just a very flawed system]).

The gay samples here are “very” gay, in the sense of being out and looking for same-sex partners. This does not mean that they are “very” gay in any biological, or born-this-way sense. If you could quantitatively score people on the amount of their gayness (say on some kind of scale…), outness and same-sex attraction might be correlated, but they are different things. The correlation here is assumed, and assumed to be strong, but this is not demonstrated. (It’s funny that they think they address the problem of the sample by comparing the results with a sample from Facebook of people who like pages such as “I love being gay” and “Manhunt.”)

Another way of saying this is that the dependent variable is poor defined, and then conclusions from studying it are generalized beyond the bounds of the research. So I don’t agree that the results:

provide strong support provide strong support for the PHT [prenatal hormone theory], which argues that same-gender sexual orientation stems from the underexposure of male fetuses and overexposure of female fetuses to prenatal androgens responsible for the sexual differentiation of faces, preferences, and behavior.

If it were my study I might say the results are “consistent” with PHT theory, but it would be better to say, “not inconsistent” with the theory. (There is no data about hormones in the paper, obviously.)

The authors give too much weight to things their results can’t say anything about. For example, gay men in the sample are less likely to have beards. They write:

nature and nurture are likely to be as intertwined as in many other contexts. For example, it is unclear whether gay men were less likely to wear a beard because of nature (sparser facial hair) or nurture (fashion). If it is, in fact, fashion (nurture), to what extent is such a norm driven by the tendency of gay men to have sparser facial hair (nature)? Alternatively, could sparser facial hair (nature) stem from potential differences in diet, lifestyle, or environment (nurture)?

The statement is based on the faulty premise that they are “nature and nurture are likely to be as intertwined.” They have no evidence of this intertwining. They could just as well have said “it’s possible nature and nurture are intertwined,” or, with as much evidence, “in the unlikely event nature and nurture are intertwined.” So they loaded the discussion with the presumption of balance between nature and nurture, and then go on to speculate about sparse facial hair, for which they also have no evidence. (This happens to be the same way Charles Murray talks about race and IQ: there must be some intertwining between genetics and social forces, but we can’t say how much; now let’s talk about genetics because it’s definitely in there.)

Aside from the flaws in the study, the accuracy rate reported is easily misunderstood, or misrepresented. To choose one example, the Independent wrote:

According to its authors, who say they were “really disturbed” by their findings, the accuracy of an AI system can reach 91 per cent for homosexual men and 83 per cent for homosexual women.

The authors say this, which is important but of course overlooked in much of the news reporting:

The AUC = .91 does not imply that 91% of gay men in a given population can be identified, or that the classification results are correct 91% of the time. The performance of the classifier depends on the desired trade-off between precision (e.g., the fraction of gay people among those classified as gay) and recall (e.g., the fraction of gay people in the population correctly identified as gay). Aiming for high precision reduces recall, and vice versa.

They go on to give a technical, and I believe misleading example. People should understand that the computer was always picking between two people, one of whom was identified as gay and the other not. It had a high percentage chance of getting that choice right. That’s not saying, “this person is gay”; it’s saying, “if I had to choose which one of these two people is gay, knowing that one is, I’d choose this one.” What they don’t answer is this: Given 100 random people, 7 of whom are gay, how many would the model correctly identify yes or no? That is the real life question most people probably think the study is answering.

As technology writer Hal Hodson pointed out on Twitter, if someone wanted to scan a crowd and identify a small number individuals who were likely to be gay (and ignoring many other people in the crowd who are also gay), this might work (with some false positives, of course).


Probably someone who wanted to do that would be up to no good, like an oppressive government or Amazon, and they would have better ways of finding gay people (like at pride parades, or looking on Facebook, or dating sites, or Amazon shopping history directly — which they already do of course). Such a bad actor could also train people to identify gay people based on many more social cues; the researchers here compare their computer algorithm to the accuracy of untrained people, and find their method better, but again that’s not a useful real-world comparison.

Aside: They make the weird but rarely-necessary-to-justify decision to limit the sample to White participants (and also offer no justification for using the pseudoscientific term “Caucasian,” which you should never ever use because it doesn’t mean anything). Why couldn’t respondents (or software) look at a Black person and a White person and ask, “Which one is gay?” Any artificial increase in the homogeneity of the sample will increase the likelihood of finding patterns associated with sexual orientation, and misleadingly increase the reported accuracy of the method used. And of course statements like this should not be permitted: “We believe, however, that our results will likely generalize beyond the population studied here.”

Some readers may be disappointed to learn I don’t think the following is an unethical research question: Given a sample of people on a dating site, some of whom are looking for same-sex partners and some of whom are looking for different-sex partners, can we use computers to predict which is which? To the extent they did that, I think it’s OK. That’s not what they said they were doing, though, and that’s a problem.

I don’t know the individuals involved, their motivations, or their business ties. But if I were a company or government in the business of doing unethical things with data and tools like this, I would probably like to hire these researchers, and this paper would be good advertising for their services. It would be nice if they pledged not to contribute personally to such work, especially any efforts to identify people’s sexual orientation without their consent.

Responses on fatherhood: hormones, science and god

The fatherhood post yesterday has gotten (for this blog), a lot of readers and some interesting responses. As I wrote out some extended, disorganized comment responses, I realized I may as well elevate them to an independent post (still a disorganized rant though).

I like the discussion by the authors on the Scientific American blog suggested by szopeno. Like I said in the original post, it’s quite reasonable that caring behavior affects hormone levels, as we know things like stress and fear do as well, with all kinds of mental and physical effects. If you randomly subjected some people to competitive athletic coaching, and handed others an infant, I wouldn’t be surprised to see the competition people behaving more aggressively and the baby-holders being more nurturing on average three months later. That would be interesting.

What is the implication? Are we shocked that some aspects of fatherhood (or childcare or sex) provoke a “biological” response? If that shocks you, you might like to know that by simply showing people pictures of other people behaving in certain ways, their bodies are are more likely to undergo spontaneous physical transformations. Just from sitting there looking at pictures! Also, if you inject an athlete with testosterone he can ride his bike really fast.


It does not follow from these findings of a hormonal response to life events that we should promote certain family arrangements as “natural,” which is where Wilcox and the religious-sociological-complex is taking this. If the goal is to change men’s testosterone levels, that might be done with medication. If the goal is to reduce aggressiveness, try teaching meditation in public schools. If we want people to be better parents, we can give them jobs, healthcare, housing and childcare support.

We have lots of ways of trying to promote happiness and pro-social behavior. However, like the crazy list of potential risks and side effects for men taking low-T medication, there are consequences to any such intervention.

Fortunately for individual freedom and human rights, some of us know that we can punish or prevent bad behavior — and reward or encourage good behavior — without attacking or rewarding whole status categories of people. Children with rich, married, college-educated parents are more likely to get into and finish college. So, we ought to fund a public school system, fund student loans for college — and also protect the children of the evil, sick or ineffective rich, married, college-educated parents from harm. But that doesn’t mean we should sterilize poor people.

So, is fatherhood good?

It’s not a question with one answer. One of the things Wilcox and the family “gold standard” promoters do is find ways that people in “traditional” families are doing better on average and use that to promote family conformity. But the averages conceal the sources of variation. Comparing the average father to the average non-father won’t tell you much about how fatherhood affects men because fatherhood occurs along with so many combinations of other transitions, experiences and resources. If you randomly assigned fatherhood to random men — at random moments in their lives — you could come up with an answer. Otherwise I’m not optimistic, and if it’s not answerable I doubt it’s a good question for social science.

Imagine three sets of outcomes: money, happiness and healthiness. Each is affected by social background and context. Then consider men entering fatherhood with different levels of each beforehand, and see how each outcome changes for all the different combinations (e.g., income changes for rich, happy, healthy; income changes for poor, happy, healthy; etc.). The possibilities multiply. If you’re Brad Wilcox you can work back from your goal — married nuclear families — and compare them to everyone else to cherry-pick any worse outcome at any time, and lo, discover that the Bible was right after all. If you really want to know it’s not so easy.

I haven’t yet read Doing the Best I Can: Fatherhood in the Inner City, the new book by Kathryn Edin and Timothy J. Nelson, but that seems promising for an in-depth look at fatherhood in the flow of men’s lives, with a lot of attention to the social context (education, employment, incarceration, complex families and relationships, etc.).


Don’t take my Word for it

If you start from a God-given definition of what’s good, and science can’t change that, then science becomes just a convenient way of explaining what you already knew, which is not science — it’s what the Church calls “natural law.”

Wilcox denies that’s how it works, naturally. At a conference on the family and natural law, he was quoted as saying,

Our support for the renewal of marriage is not predicated on some … religious worldview. Rather, it’s based on a reasonable understanding of the human condition that is accessible to all men and women of good will. … Evidence suggests to us that intact, biological marriage is still the gold standard.

That depends on what you mean by “predicated.” Years before the “Regnerus affair,” during which Mark Regnerus joined Wilcox in a scheme to use science against marriage equality in the courts, Regnerus gave his view of the importance of (a certain kind of) marriage, and it did not originate from his scientific training:

The importance of Christian marriage as a symbol of God’s covenantal faithfulness to his people—and a witness to the future union of Christ and his bride—will only grow in significance as the wider Western culture diminishes both the meaning and actual practice of marriage. Marriage itself will become a witness to the gospel.

That divine law and natural law do not conflict is an article of faith (literally). I think Pope Leo XIII put it well when he wrote:

Now, reason itself clearly teaches that the truths of divine revelation and those of nature cannot really be opposed to one another, and that whatever is at variance with them must necessarily be false. Therefore, the divine teaching of the Church, so far from being an obstacle to the pursuit of learning and the progress of science, or in any way retarding the advance of civilization, in reality brings to them the sure guidance of shining light.

Thus, rather than see science as a candle in the dark, natural law says that science needs a candle in the dark, and God has one. Could any research penetrate that mindset? If your research contradicts the “truths of divine revelation” then your research is wrong. Try again! Science in this vein is just looking for ways to convince secular society that the Church is already right. It is a self-fulfilling prophecy (literally). From the same natural law conference news report:

While there’s limited data on the effects of same-sex marriage on children, Wilcox hypothesized that in a few years, research will show that children in lesbian or gay family situations will exhibit some of the same problems as children from father-less or cohabiting relationships.

That conference was in January 2011. At that point Wilcox already had the New Family Structure Study machinery in motion, which would end up confirming to the faithful what they already knew.

Fatherhood’s transformations: What if someone checked the facts?

For Father’s Day, W. Bradford Wilcox wrote a piece for Slate, “Daddy’s Home: Fatherhood transforms men. But only if they live with their kids.” Raising the question: what if Slate checked the facts they published? Oh, right.


Anyway, people who publish what Wilcox writes by now have been duly notified of his dishonesty, data manipulation, and incompetence in the service of ideological (and financial) ends. But I do this as an exercise in critical thinking and research, and to help people who want to be informed understand these shenanigans.

So here we go, on the specific claims only. Bogus claim-inflating spin between claims ignored. For each claim, the source, interpretation, and Veracity Score™ from 0 to 10.


The claim: “For many men, the transformative physical power of fatherhood first manifests itself when the pounds start piling up. One recent survey found that the average father puts on more than 10 pounds while waiting for baby to arrive.”

  • The source: A Motherlode post that links to a BBC news story that reproduces a press release from a marketing firm. No information on the research methods. The marketing firm, Onepoll, has no information about the poll on its website. Does this supposed weight gain of fathers-to-be reflect the evolutionary draw of wedded fatherhood? Said a spokesperson for Onepoll (who we are listening to why?): “So if the kitchen cupboards are suddenly brimming with snacks and food, it’s no wonder blokes are tempted to tuck in as well.”
  • Veracity Score: 2 (Maybe fathers gain weight during their partners’ pregnancies. This is “transformative physical power”?)

The claim: “Studies suggest that after the arrival of a baby men’s testosterone falls…”

  • The source: This paper measuring testosterone level in a sample of Filipino men at two points four years apart. Those who were married with children four years later had larger drops and lower levels. This seems like a legitimate finding. Why wouldn’t men’s hormone levels change in response to such changes in their environment and behavior? It seems a little dicey that the men took their own saliva samples, since other research (from the same study) shows levels change dramatically in the first half hour after people wake up. Although they supposedly took the samples right when they woke up, and recorded the time, it seems possible sleep is the issue here (although they controlled for a simple indicator of “sleep quality.”)
  • Veracity Score: 6

The claim: “Studies suggest that after the arrival of a baby men’s… prolactin levels rise.”

  • The source: Unsourced claim from a parenting advice book.
  • Veracity Score: 0 (Maybe true, who knows? No evidence here.)

The claim: “But these hormonal changes don’t just happen for any father; they appear to be most likely for men who are living in a long-term relationship with the mother of their children.”

  • Source: The link is to the same Philippines study. There, the authors write: “Because this sample is drawn from a cultural setting in which it is rare for men to become new fathers outside of stable romantic partnerships or to file for divorce, there were few single new fathers (n = 12) or divorced men (n = 9), who therefore were excluded from longitudinal analyses.”
  • Veracity Score: 0 (the study specifically said not that)

The claim: “Moreover, research by anthropologist Peter Gray indicates that drops in testosterone are most pronounced among men engaged in ‘affiliative pair bonding and paternal care.'”

  • Source: This paper that measured testosterone levels at one point in time among 126 men in Beijing, 30 of whom were fathers (all conveniently having exactly one child). The fathers had lower testosterone levels. The paper doesn’t have a longitudinal design, however, so it can’t make causal claims – and does not mention drops in testosterone. Also, among the fathers there was no difference between those with younger children and those with older children, which is not good for the hands-on-nurturing effect theory.
  • Veracity Score: 2 (some association, no causal connection, bogus description of “drops” by Wilcox)

The claim: “Fathers who live with their children are significantly less likely to be depressed, and more likely to report they are satisfied with their lives, compared to childless men.”

  • The source: This paper using the National Survey of Families and Households. The paper did not use longitudinal data, and the authors wrote, “Before we can be confident that fatherhood is causally associated with the outcomes we observed, we must address this possibility [of selection effects], probably with longitudinal data.” Is it just possible that happier men are more likely live with their children, rather than the other way around? I’d consider it. Further, they entered some statistical controls for marital status, income, and race/ethnicity. They wrote:
    • Once a number of controls were entered, however, especially marital status, these effects of fatherhood largely disappeared. The two exceptions were that men with children living elsewhere remain somewhat more likely to have depressive symptoms than men who were currently living with their children, and men with older children were slightly more satisfied. For the most part, however, fatherhood does not appear to be independently associated with psychological and physical health.
  • To clarify: (a) those living with children over age 19 were slightly more likely than those with younger children to be satisfied with their lives! (b) those living apart from their children were slightly more likely to be depressed than those living with their children. This does not require evolutionary rocket-science to predict.
  • Veracity Score: 0 (would be 1 for the partial effect, but he loses a point for gross exaggeration and misstatement of the findings.)

The claim: “After the arrival of a baby, new fathers tend to work more hours and pull down more money.”

  • The source: The link is to a 1998 book by the late Steven Nock. I don’t have the book, but more recent research by Rebecca Glauber (using the same data) confirms this. Glauber suggests the effects could be the result of the gender division of labor within marriage or employers’ preferential treatment (or other unobserved factors).
  • Veracity Score: 8 (true statement, but doesn’t address whether “fatherhood is socially transformative for men.”)

The claim: “By contrast, men who have children outside of wedlock, Nock found, are less likely to be employed, earn less, and have higher rates of poverty compared to their peers who did not father children outside of wedlock.”

  • The source: This paper by Nock. When you are talking about a group of men who are poor on average to begin with, it’s hard to know if unmarried fatherhood made them poorer. The paper reports that these negative outcomes are associated with men who father children outside of marriage. However, the paper also reports (grudgingly) that the effects are no longer significant when prior conditions are controlled or when brothers are compared with each other. Thus, there is no causal effect found for men having children outside of marriage.
  • Veracity Score: 1 (“less likely” is true, but the non-causal nature means it offers no support for the claim that “fatherhood is socially transformative for men.”)

The claim: “men who live apart from their children attend church infrequently and drink more frequently, much like their peers without children.”

  • The source: This paper again. Quick check of the tables shows that alcohol and drug abuse is no different between men who live with children versus without once a few basic demographic variables are controlled, so that’s wrong. As for church, the paper shows the association in the cross section but makes no causal claim.
  • Veracity Score: 1 (association for 1/2 the claim, no causal effect.)

You will note some of these claims have Veracity Scores > 0. I didn’t ignore those claims because that would have been misleading (however, if I’m wrong and the testosterone stuff is wronger than I thought, please let me know).

To some blog editors, claims come and go. You might balance them out by posting something from someone who disagrees. Hanna Rosin, a founder of Slate’s XX, dismissively calls this “data wars.” More clicks for them. One alternative would be to not publish the bogus stuff in the first place.

h/t Neal Caren for suggesting this.