Author Archives: Philip N. Cohen

Who’s happy in marriage? (Not just rich, White, religious men, but kind of)

I previously said there was a “bonafide trend back toward happiness” within marriage, for the years 2006 to 2012. This was based on the General Social Survey trend going back 1973, with married people responding to the question, “Taking all things together, how would you describe your marriage?”

Since then, the bonafide trend has lost its pop. Here’s my updated figure:


I repeated this analysis controlling for age, race/ethnicity, and education, and year specified in quadratic form. This shows happiness falling to a trough at 2004 and then starting to trend back. But given the last two points, confidence in that rebound is weak. Still a solid majority are happy with their marriages.

Who’s happy?

But who are those happy in marriage people? Combining the last three surveys, 2012, 2014, and 2016, this is what we get (effect of age and non-effect of education not shown). Note the y-axis starts at 50%.


So to be happy in marriage, my expert opinion is you should become male and White, see yourself as upper class, go to church all the time, and have extreme political views. And if you’re not all those things, don’t let the marriage promoters tell you what your marriage is going to be like.

Note: I previously analyzed the political views thing before, so this is an update to that. On trends and determinants of social class identification, see this post.)

Here’s my Stata code, written to run on the full GSS through 2016 data. Play along at home!

set maxvar 10000
use "GSS7216_R1a.dta", clear
gen since73 = year-1973
gen rwgt = round(wtssall)
keep if year >1972
gen verhap=0
replace verhap=1 if hapmar==1
logit verhap c.age##c.age i.race c.since73##c.since73 [weight=rwgt]
margins, at(since73=(0(1)43))
recode attend (1/3=1) (4/6=2) (7/8=3), gen(attendcat)
logit verhap c.age##c.age i.race i.class i.attendcat i.polviews if year>2010 [weight=rwgt]
margins sex race class attendcat polviews if year>2010



Filed under Research reports

Hitting for the cycle in Trump land

While cautious about the risks the normalizing Trump, I have nevertheless attempt to engage a little with his followers on Twitter, which is the only place I usually meet people who are willing to support him openly. One exchange yesterday struck me as iconic so I thought I’d share it.

Maybe if I’d studied conversation or text analysis more I would be less amazed at how individuals acting alone manage to travel the same discursive paths with such regularity. In this case a Trump supporter appears to spontaneously recover this very common path over a short handful of tweets:

  1. I don’t believe your facts
  2. If they are true it’s no big deal
  3. Obama was worse
  4. Nothing matters everyone is corrupt

The replies got jumbled up so I use screenshots as well as links (you can start here if you want to try to follow it on Twitter).

Ivanka Trump tweeted something about how she was going to India. Since I’m blocked by Donald but not Ivanka, if it’s convenient I sometimes do my part by making a quick response to her tweets. I said, “Your representation of the US in India epitomizes the corruption and incompetence of this administration.”


The responses by @armandolbstd and @dreadGodshand are very typical, demanding “proof” about things that are obvious to basically informed people. I made the typical mistake of thinking we could talk about common facts, using the word “literally” a lot:


OK, so then I got sucked in with what I thought was the most obvious example of corruption, leading @dreadGodshand into the whole cycle:


Interesting how the “ok, maybe it’s true but so what” thing we hear constantly strikes him as suddenly a new question. And from there on through no-big-deal to Obama-was-worse to nothing-matters:


And he concluded, “I’m not hating obama for it. It’s not that big of a deal. It’s designed that way to help their parties. Who really cares?”

This reminds me of the remarkable shift in attitudes toward immoral conduct among White evangelicals, who used to think it was a very big deal if elected officials (Obama) did immoral things in private but now (Trump) shrug:


People do change. But I don’t put that much stock in changing people, and contrary to popular belief I don’t think that’s how you have to win elections. In the end defeating Trumpism politically means outvoting people who think like this, which will be the result of a combination of things: increasing turnout (one way or the other) among people who oppose him, decreasing turnout among people who support him, and changing the number of people in those two categories.

You might think this example just shows the futility of conversations like this, but maybe I’m missing some opportunity to get through. And it’s also possible that this kind of thing is demoralizing to Trump supporters, which could be good, too. So, live and learn.

Leave a comment

Filed under Politics

Science finds tiny things nowadays (Malia edition)

We have to get used to living in a world where science — even social science — can detect really small things. Understanding how important really small things are, and how to interpret them, is harder nowadays than just finding them.

Remember when Hanna Rosin wrote this?

One of the great crime stories of the last twenty years is the dramatic decline of sexual assault. Rates are so low in parts of the country — for white women especially — that criminologists can’t plot the numbers on a chart.

Besides being wrong about rape (it has declined a lot, but it’s still high compared with most countries), this was a funny statement about science (I’ve heard we can even plot negative numbers now!). But the point is we have problems understanding, and communicating about, small things.

So, back to names.

In 2009, the peak year for the name Malia in the U.S., 1,681 girls were given that name, according to the Social Security Administration, or .041% of the 4.14 million children born that year (there are no male Malias in the SSA’s public database, meaning they have never recorded more than 4 in one year). That year, 7.5% of women ages 18-44 had a baby. If my arithmetic is right, say you know 100 women ages 18-44, and each of them knows 100 others (and there is no overlap in your network). That would mean there is a 30% chance one of your 10,000 friends of a friend had a baby girl and named her Malia in 2009. But probably there is a lot of overlap; if your friend-of-friend network is only 1,000 women 18-44 then that chance would fall to 3%.

Here is the trend in girls named Malia, relative to the total number of girls born, from 1960 to 2016:


To make it easier to see the Malias, here is the same chart with the y-axis on a log scale.


This shows that Malia has been on a long upward trend, from less than 50 per year in the 1960s to more than 1,000 per year now. And it also shows a pronounced spike in 2009, the year Malia peaked .041%. In that year, the number of people naming daughters Malia jumped 75% before declining over the next three years to resume it’s previous trend. Here is the detail on the figure, just showing the Malia in 2005-2016:


What happened there? We can’t know for sure. Even if you asked everyone why they named their kid what they did, I don’t know what answers you would get. But from what we know about naming patterns, and their responsiveness to names in the news (positive or negative), it’s very likely that the bump in 2009 resulted from the high profile of Barack Obama and his daughter Malia, who was 11 when Obama was elected.

What does a causal statement like that that really mean? In 2009, it looks to me like about 828 more people named their daughters Malia than would have otherwise, taking into account the upward trend before 2008. Here’s the actual trend, with a simulated trend showing no Obama effect:


Of course, Obama’s election changed the world forever, which may explain why the upward trend for Malia accelerated again after 2013. But in this simple simulation, which brings the “no Obama” trend back into line with the actual trend in 2014, there were 1,275 more Malias born than there would have been without the Obama election. This implies that over the years 2008-2013, the Obama election increased the probability of someone naming their daughter Malia by .00011, or .011%.

That is a very small effect. I think it’s real, and very interesting. But what does it mean for anything else in the world? This is not a question of statistical significance, although those tools can help. (These names aren’t a probability sample, it’s a list of all names given.) So this is a question for interpreting research findings now that we have these incredibly powerful tools, and very big data to analyze with them. The number alone doesn’t tell the story.

1 Comment

Filed under Me @ work

On artificially intelligent gaydar

A paper by Yilun Wang and Michal Kosinski reports being able to identify gay and lesbian people from photographs using “deep neural networks,” which means computer software.

I’m not going to describe it in detail here, but the gist of it is they picked a large sample of people from a dating website who said they were looking for same-sex partners, and an equal number that were looking for different-sex partners, and trained their computers to learn the facial features that could distinguish the two groups (including facial structure measurements as well as grooming things like hairline and facial hair). For a deep dive on the context of this kind of research and its implications, and more on the researchers and the controversy, please read this post by Greggor Mattson first. These notes will be most useful after you’ve read that.

I also reviewed a gaydar paper five years ago, and some of the same critiques apply.

This figure from the paper gives you an idea:


These notes are how I would start my peer review, if I was peer reviewing this paper (which is already accepted and forthcoming in the Journal of Personality and Social Psychology — so much for peer review [just kidding it’s just a very flawed system]).

The gay samples here are “very” gay, in the sense of being out and looking for same-sex partners. This does not mean that they are “very” gay in any biological, or born-this-way sense. If you could quantitatively score people on the amount of their gayness (say on some kind of scale…), outness and same-sex attraction might be correlated, but they are different things. The correlation here is assumed, and assumed to be strong, but this is not demonstrated. (It’s funny that they think they address the problem of the sample by comparing the results with a sample from Facebook of people who like pages such as “I love being gay” and “Manhunt.”)

Another way of saying this is that the dependent variable is poor defined, and then conclusions from studying it are generalized beyond the bounds of the research. So I don’t agree that the results:

provide strong support provide strong support for the PHT [prenatal hormone theory], which argues that same-gender sexual orientation stems from the underexposure of male fetuses and overexposure of female fetuses to prenatal androgens responsible for the sexual differentiation of faces, preferences, and behavior.

If it were my study I might say the results are “consistent” with PHT theory, but it would be better to say, “not inconsistent” with the theory. (There is no data about hormones in the paper, obviously.)

The authors give too much weight to things their results can’t say anything about. For example, gay men in the sample are less likely to have beards. They write:

nature and nurture are likely to be as intertwined as in many other contexts. For example, it is unclear whether gay men were less likely to wear a beard because of nature (sparser facial hair) or nurture (fashion). If it is, in fact, fashion (nurture), to what extent is such a norm driven by the tendency of gay men to have sparser facial hair (nature)? Alternatively, could sparser facial hair (nature) stem from potential differences in diet, lifestyle, or environment (nurture)?

The statement is based on the faulty premise that they are “nature and nurture are likely to be as intertwined.” They have no evidence of this intertwining. They could just as well have said “it’s possible nature and nurture are intertwined,” or, with as much evidence, “in the unlikely event nature and nurture are intertwined.” So they loaded the discussion with the presumption of balance between nature and nurture, and then go on to speculate about sparse facial hair, for which they also have no evidence. (This happens to be the same way Charles Murray talks about race and IQ: there must be some intertwining between genetics and social forces, but we can’t say how much; now let’s talk about genetics because it’s definitely in there.)

Aside from the flaws in the study, the accuracy rate reported is easily misunderstood, or misrepresented. To choose one example, the Independent wrote:

According to its authors, who say they were “really disturbed” by their findings, the accuracy of an AI system can reach 91 per cent for homosexual men and 83 per cent for homosexual women.

The authors say this, which is important but of course overlooked in much of the news reporting:

The AUC = .91 does not imply that 91% of gay men in a given population can be identified, or that the classification results are correct 91% of the time. The performance of the classifier depends on the desired trade-off between precision (e.g., the fraction of gay people among those classified as gay) and recall (e.g., the fraction of gay people in the population correctly identified as gay). Aiming for high precision reduces recall, and vice versa.

They go on to give a technical, and I believe misleading example. People should understand that the computer was always picking between two people, one of whom was identified as gay and the other not. It had a high percentage chance of getting that choice right. That’s not saying, “this person is gay”; it’s saying, “if I had to choose which one of these two people is gay, knowing that one is, I’d choose this one.” What they don’t answer is this: Given 100 random people, 7 of whom are gay, how many would the model correctly identify yes or no? That is the real life question most people probably think the study is answering.

As technology writer Hal Hodson pointed out on Twitter, if someone wanted to scan a crowd and identify a small number individuals who were likely to be gay (and ignoring many other people in the crowd who are also gay), this might work (with some false positives, of course).


Probably someone who wanted to do that would be up to no good, like an oppressive government or Amazon, and they would have better ways of finding gay people (like at pride parades, or looking on Facebook, or dating sites, or Amazon shopping history directly — which they already do of course). Such a bad actor could also train people to identify gay people based on many more social cues; the researchers here compare their computer algorithm to the accuracy of untrained people, and find their method better, but again that’s not a useful real-world comparison.

Aside: They make the weird but rarely-necessary-to-justify decision to limit the sample to White participants (and also offer no justification for using the pseudoscientific term “Caucasian,” which you should never ever use because it doesn’t mean anything). Why couldn’t respondents (or software) look at a Black person and a White person and ask, “Which one is gay?” Any artificial increase in the homogeneity of the sample will increase the likelihood of finding patterns associated with sexual orientation, and misleadingly increase the reported accuracy of the method used. And of course statements like this should not be permitted: “We believe, however, that our results will likely generalize beyond the population studied here.”

Some readers may be disappointed to learn I don’t think the following is an unethical research question: Given a sample of people on a dating site, some of whom are looking for same-sex partners and some of whom are looking for different-sex partners, can we use computers to predict which is which? To the extent they did that, I think it’s OK. That’s not what they said they were doing, though, and that’s a problem.

I don’t know the individuals involved, their motivations, or their business ties. But if I were a company or government in the business of doing unethical things with data and tools like this, I would probably like to hire these researchers, and this paper would be good advertising for their services. It would be nice if they pledged not to contribute personally to such work, especially any efforts to identify people’s sexual orientation without their consent.


Filed under Research reports

Teach it! Family syllabus supplements for Fall 2017

This year we were working on the second edition of my book The Family: Diversity, Inequality, and Social Change, which will be out in 2018. And my new book, a collection of essays, will also be out for Spring: Enduring Bonds: Inequality, Marriage, Parenting, and Everything Else That Makes Families Great and Terrible, from University of California Press. But I’ve still produced a few blog posts this year, so I can provide an updated list of potential syllabus supplements for this fall.

In addition to the excellent teaching materials to support The Family from Norton, there is also an active Facebook group for sharing ideas and materials (instructors visit here). And then I provide a list of blog posts for family sociology courses (for previous lists, visit the teaching page). So here are some new, and some old, organized by topic. As always, I appreciate your feedback.

1. Introduction

2. History

3. Race, ethnicity, and immigration

4. Social class

5. Gender

6. Sexuality

7. Love and romantic relationships

  • Is dating still dead? The death of dating is now 50 years old, and its been eulogized so many times that its feelings are starting to get hurt.
  • Online dating: efficiency, inequality, and anxiety: I’m skeptical about efficiency, and concerned about inequality, as more dating moves online. Some of the numbers I use in this post are already dated, but this could be good for a debate about dating rules and preferences.
  • Is the price of sex too damn low? To hear some researchers tell it in a recent YouTube video, women in general — and feminism in particular — have ruined not only sex, but society itself. The theory is wrong. Also, they’re insanely sexist.

8. Marriage and cohabitation

9. Families and children

10. Divorce, remarriage, and blended families

11. Work and families

12. Family violence and abuse

13. The future of the family


Filed under Me @ work

Donald is not the biggest loser (among winning and losing names)

From 2015 to 2016 there was a 10% drop in U.S. boys given the name Donald at birth, from 690 to 621, plunging the name from 900th to 986th in the overall rankings. Here is the trend in Donalds born from 1880 to 2016, shown on a log scale, from the Social Security names database.


That 2016 drop is relatively big in percentage terms, but it’s been dropping an average of 6% per year since 1957 (it dropped 26% in the 8 years after the introduction of Donald Duck in 1934). I really wish it was a popular name so we could more easily see if the rise of Donald Trump is a factor in this. With so few new Donalds, and the name already trending downward, there’s no way to tell if Trump fanatics may be counterbalancing regular people turned off to the name.

Stability over change

How big is a fall of 69 births, which seems so trivial in relation to the 3.9 million children born last year? Among names with more than 5 births in each year, only 499 fell more, compared with 26,052 that fell less or rose. So Donald is definitely a loser.

But I am always amazed at how little change there is in most names from year to year. It sounds obvious to describe a trend as rising or falling, but names are scarily regular in their annual changes given that the statistics from one year to the next reflect independent decisions by separate people who overwhelmingly don’t know each other.

Here is away of visualizing the change in the number of babies given each name, from 2015 to 2016. There is one dot for each name. Those below the diagonal had a decrease in births, those above had an increase; the closer to the line the less change there was. (To adjust for the 1% drop in total births, these are shown as births per 1,000 total born.)

2015-2016 count change

No name had a change of more than 1700 births this year (Logan dropped 1697, a drop of 13%; Adeline increased 1700, or 71%). There just isn’t much movement. I find that remarkable. (Among top names, James stands out this year: 14,773 born in 2015, rising by 3 to 14,776 in 2016.)

Here’s a look at the top right corner of that figure, just showing names with 3 per 1,000 or more births in either 2015 or 2016:

2015-2016 count change 3per1000

Note that most of these top names became less popular in 2016 (below the diagonal). That fits the long-term trend, well known by now, for names to become less popular over time, which means name diversity is increasing. I described that in the history chapter of my textbook, The Family; and going back to this old blog post from 2011. (This great piece by Tristan Bridges explores why there is more diversity among female names, as you can see by the fact that they are outnumbered among the top names shown here.)

Anyway, since I did it, here are the top 20 winners and losers, in numerical terms, in 2016. Wow, look at that catastrophic 21% drop in girls given the name Alexa (thanks, Amazon). I don’t know what’s up with Brandon and Blake. Your explanations will be as good as mine for these.



For the whole series of name posts on this blog, follow the names tag, including a bunch on the name Mary

Here’s the Stata code I used (not including the long-term Donald trend), including the figure and tables. The dataset is in a zip file at Social Security, here. There is a separate file for each year. The code below runs on the two latest files: yob2015.txt and yob2016.txt.

import delimited [path]\yob2016.txt
sort v2 v1
rename v3 count16
save "[path]\n16.dta", replace
import delimited [path]\yob2015.txt
sort v2 v1
rename v3 count15
merge 1:1 v2 v1 using [path]\n16.dta
drop _merge

gen pctchg = 100*(count16-count15)/count15
drop if pctchg==. /* drops cases that don't appear in both years (5+ names) */

gen countchg = count16-count15
rename v2 sex
rename v1 name

gsort -count16
gen rank16 = _n

gsort -count15
gen rank15 = _n

gsort -countchg
gen riserank=_n

gsort countchg
gen fallrank=_n

gen rankchg = rank15-rank16

format pctchg %9.1f 
format count15 count16 countchg %15.0fc

gen prop15 = (count15/3978497)*1000 /* these are births per 1000, based on NCHS birth report for 15 & 16 */
gen prop16 = (count16/3941109)*1000

*winners table
sort riserank
list sex name count15 count16 countchg pctchg rank15 rank16 rankchg in 1/20, sep(0)

*losers table
sort fallrank
list sex name count15 count16 countchg pctchg rank15 rank16 rankchg in 1/20, sep(0)

*figure for all names
twoway (scatter prop16 prop15 if sex=="M", mc(blue) m(Oh) mlw(vvthin)) (scatter prop16 prop15 if sex=="F" , m(Oh) mc(pink) mlw(vvthin))

*figure for top names
twoway (scatter prop16 prop15 if sex=="M" & (prop15>=3 | prop16>=3), ml(name) ms(i) mlabp(0)) (scatter prop16 prop15 if sex=="F" & (prop15>=3 | prop16>=3), ml(name) ms(i) mlabp(0))

1 Comment

Filed under Me @ work

Women’s Equality Day earnings data stuff and suffrage note

Tomorrow is Women’s Equality Day, which commemorates the day, in 1920, when U.S. women were granted the right to vote. (Asterisk: White women.)

One historical story

Congress finally passed a Constitutional amendment for women’s suffrage in 1918, after decades of activism. The suffrage movement in the end successfully made a few convincing arguments – and one clarification. The most important may have been that White women had proved their patriotism during the war, and so they finally deserved the vote. I wrote in 1996:

“No one thing connected with the war is of more importance at this time than meeting the reasonable demand of millions of patriotic and Christian women of the Nation that the amendment for woman suffrage be submitted to the states,” declared Representative James Cantrill. And, he added, “Right, justice, liberty and democracy have always been, and will always be, safe in the tender care of American womanhood.”

And you know what he meant by “American womanhood” (an image the mainstream suffrage movement encouraged to various degrees over the years):


American Progress, by John Gast (1872)

The important clarification was that women’s suffrage would absolutely not hurt White supremacy in the South. You know how it is when you just need that Southern vote. I went on:

If reluctant congressmen would only believe in the contribution of white women that was waiting to be made, suffrage advocates explained, the political math was irresistible. “There are more white women of voting age in the South to-day than there are negro men and women together,” [Congress’s only woman, Jeannette] Rankin said. Representative Scott Ferris assured them that poll taxes and literacy tests would remain untouched, so that “for every negro woman so enfranchised there will be hundreds and thousands of intelligent white women enfranchised” (Congressional Record 1918, 779). And Representative Thomas Blanton proclaimed, “So far as State rights are concerned, if this amendment sought to take away from any State the right of fixing the qualifications of its voters, I would be against it first, last, and all the time, but such it does not.” Although states should be allowed to set qualifications for voting, he believed, they could not do so at the expense of undermining true republicanism, and, “if you deny the 14,000,000 white women of this country the right to vote, you are interfering with a republican form of government [Applause]” (786). That day, the House passed the amendment with the required two-thirds vote.

Anyway, rights are rights, America is America, history is history (ha ha).

Some pay gap numbers

Back to nowadays. Today’s numbers come from some analysis of the gender earnings gap I did to support the Council on Contemporary Families brief for Women’s Equality Day. One big story is women’s rising education levels, especially BA completion.

In the active labor force as often described (age 25-54, working at least 20 hours per week and 26 weeks in the previous year), women surpassed men in BA completion in 2002:


That’s very good for women with regard to the earnings gap, because at every level of education men earn more than women. Women’s full-time full-year earnings are between 70% and 80% of men’s at all education levels except the highest, where they diverge: men who are doctors and lawyers earn much more than women, while women PhDs are doing relatively well. Here’s the 2015 breakdown by education:


With the education trend and differentials in mind, consider these multivariate model results. Going back to the sample of 25-54-year-old people working at least half-time and half the year, here are two results. The first line, in blue, shows the gender earnings ratio when only age is controlled. It shows women gaining on men from 2000 to 2016, from 77% to 83%. This is not much progress for 25 years, but it’s the slow pace we’ve come to expect during that time. The other line shows result from a more complete model, which adds controls for education, race/ethnicity, marital status, and presence of children; it shows even less progress.


In the full model (orange line) the relative gains for women are not as great. (Note I don’t include occupation in the “full” model although that’s very important; it’s just also an outcome of gender so I let it be in the gender variable for descriptive purposes.)

In the old days, when women had less education than men, controlling for education shrank the gap; now it appears the opposite is true. I haven’t done the whole decomposition to confirm this, but here’s another way to look at it. The next figure shows the same models, but in two separate samples, with and without BA degrees (and no control for education). The figure shows little progress within education groups. This implies it’s the increase in education for women that is driving the progress seen in the previous figure.


In conclusion: there is a substantial gender earnings gap at every level of education. The limited progress toward equality we’ve seen in the past 25 years may be driven by increases in women’s education.

There is a lot of other research on this — especially about segregation, which I didn’t include here — and a lot more to be done.

This is a little analysis, but if you’d like to do more, or see how I did what I’ve shown here, I posted the Stata code, data from, codebook, and spreadsheet file on the Open Science Framework site here. You can use any of it for whatever you like, with a citation that looks like this one the OSF generates:

Cohen, P. N. (2017, August 25). Gender wage gap analysis, 1992-2016. Retrieved from


Filed under Research reports