Tag Archives: data

Less than half of women with PhDs in survey keep ‘maiden’ names

Marital Name Change Survey first results and open data release.

Over the last three days 3,400 ever-married U.S. residents took my Marital Name Change Survey. I distributed the survey link on this blog, Facebook and Twitter. I don’t know who took it, but based on the education and occupation data a very large share of the respondents were women (88%) with professional degrees (30%) or Phds (27%). It’s not a representative sample, but the results may still be interesting.

Here I’ll give a few topline numbers as of 8:00 this morning, and then link to a public version of the data and materials. These results reflect a little data checking and cleaning and of course are subject to change.

Respondents were asked about their most recent marriage. Half were married in the 2010s, but the sample includes more than 400 married in the 1990s and 200 earlier.

mncs1

The vast majority (84%) were women married to men; 11% were men married to women and 4% (~140) were in same-gender marriages. Here are some observations about the women married to men. The name-change choices are shown below, with “R change” indicating the respondent changed their name, and “Sp change” indicating their spouse changed. The “Other” field included a write-in, and the vast majority of those were variations on hyphenations or changes to middle names.

mncs4

Because of the convenience nature of the sample, I don’t put much stock in the overall trend (I’ll try to develop a weighting scheme for this, but even then). However, I think the PhD sample is worth looking at. Here is the trend of women with PhDs (now or at the time of marriage) married to men.

mncs2

By this reckoning, the feminist-name heyday was in the 1980s, followed by a backslide, and now a rebound of women with PhDs keeping their names. The 2010s trend is like that found in the Google Consumer survey reported by Claire Cain Miller and Derek Willis in NYT Upshot.

Note, these no-change rates are higher than those reported by Gretchen Gooding and Rose Kreider from the 2004 American Community Survey, which showed 33% of married women with PhDs had different surnames than their husbands (regardless of when they got married). I show 53% in the 2000s had different names than their husbands, and 57% in the 2010s. Maybe that’s because I have more social science and humanities PhDs, or just a more woke sample.

These results also show a strong age-at-marriage pattern, with PhD women much more likely to keep their names if they married at older ages. Over age 40, 74% of women with PhDs kept their names, compared with 20% who married under age 25. (Note this is based on education at the time of the survey; I also collected education at the time of marriage, which I discuss below.)

mncs3

I asked people how important various factors were if people considered changing their names. Among PhD women marrying men who did not change their names, the most important reasons were feminism (52% “very important”), professional considerations (34%), convenience (33%), and maintaining independence within the marriage (24%). Among those who took their husbands’ names, the most important factors were the interests of their children (48%) and showing commitment to the marriage (25%).

A few other observations: PhD women were most likely to keep their names if they had no religion (53%), were Jewish (46%), or other non-Christian religion (43%); protestants (27%), Catholics (29%), and other Christians (21%) were less likely to keep their names. Finally, those who had lived together before marriage were most likely to keep their names (51% for those who lived together for three years or more, compared with 27% for those who did not live together at all).

Data availability

I don’t have time now to analyze this more, but that shouldn’t stop you. Feel free to download the data and documentation here under a CC-BY license (the only requirement is attribution). This includes a Stata data file, and PDFs of the questionnaire and codebook. This will all be revised when I have time.

Open-ended responses

I am not including in the shared files (yet) the open-ended question responses, which include descriptions of “other” name change patterns, as well as a general notes field, which is full of fascinating comments; given the non-random nature of the survey, this may turn out to be its most valuable contribution.

Here are a few.

Reasons:

I changed my name to my spouses because I HATED my father and it was the easiest way to ditch his name. I kept my married name after divorce. I’m currently pregnant (on my own) and plan to change my name again and now I will take the surname of my step-father, who has been my “dad” since I was 5.

“True partnership”

My wife and I had been together 10 years and through several iterations of domestic partnerships prior to marrying. Including before she completed her PhD. I didn’t want to change my name because my name flows really poetically and a change would ruin it (silly but true). She didn’t want to change her name in part because it’s what everyone in her profession know her as. I think we both also feel like our names represent our life histories and although we are a true partnership, that doesn’t negate our family histories or experiences. Which I guess is feminist of us. But we never explicitly discussed feminism as an issue.

This is complicated.

My partner and I both had our own hyphenated names already! We kept our own hyphenated names initially (and our marriage was not legally recognized at the time so there wasn’t a built-in or convenient option to change at that point anyway). When we had kids, we have them a hyphenated name, one of my last names and one of hers. Eventually we both changed to match the kids, so we all share the same hyphenated name now.

And so on. Fascinating reading!

11 Comments

Filed under Me @ work, Research reports

Googling “Lost in Translation” versus “nativity scenes” across state political identities

Gallup’s tracking poll asks people “whether they describe their political views as liberal, moderate or conservative.” They released state rates for these identities yesterday. It’s useful for people who need current data on political characteristics of states. This is their map:

vbxijunpyuoyr2wdggsc3g

I put the conservative-liberal gap, and the liberal-conservative gap, into Google Correlate, to see what searches are correlated with each index. The results are quite similar to the ones I got in September 2016 when I did this with 538’s predicted Clinton versus Trump margins. (Google doesn’t say exactly what period the searches cover.)

Google gives you the top 100 most-correlated searches with each index you upload. These are the ones correlated with the conservative-liberal difference, with my coding categories:

conlib

Here are the liberal-conservative correlated terms:

libcon

Draw your own conclusions. Fun coding exercise. Remember these aren’t the most common searches in conservative or liberal states, they’re the most correlated, meaning the most common ones in liberal states that are also uncommon in conservatives states, and vice versa.

Google outputs the numbers as z-scores, each search term having a mean of 0 and standard deviation of 1. And once you get them for the two different sets of correlations you can merge them together to see, for example, the negative correlation between searches for Lost in Translation (liberal) and nativity scenes (conservative):

litnat

Or vegetarian food and cook steaks:

vegcook

I did look at the “moderate” rates, and they yielded a bland list with nothing remarkable — things like “men’s volleyball shoes” and “print screen.” Also, Google couldn’t find a lot of search terms highly correlated with the “moderate” prevalence, the top correlation was only .73, versus .90-.92 for the liberals and conservatives. The average state is 36% moderate but the category seems to mean little.

Anyway, I love this stuff. You can see a whole series of these under the Google tag.

2 Comments

Filed under Politics

Data analysis: Are older newlyweds saving marriage?

COS open data badgeCOS Open Materials badge


Is the “institution” still in decline if the incidence of marriage rebounds, but only at older ages?

In my new book I’ve revisited old posts and produced this figure, which shows the refined marriage rate* from 1940 to 2015, with a discussion of possible futures:

f15

The crash scenario – showing marriage ending around 2050, is there to show where the 1950-2014 trajectory is headed (it’s also a warning against using linear extrapolation to predict the future). The rebound scenario is intended to show how unrealistic the “revive marriage culture” people are. The taper scenario emerges as the most likely alternative; in fact, it’s grown more likely since I first made the figure a few years ago, as you can see by the 2010-2014 jag.

So let’s consider the tapering scenario more substantively — what would it look like? One way to get a declining marriage rate is if marriage is increasingly delayed, even if it doesn’t become less common; people still marry, but later. (If everyone got married at age 99, we would have universal marriage and a very low refined marriage rate.) I give some evidence for this scenario here.

These trends are presented with minimal discussion; I’m not looking at race/ethnicity or social class, childbearing or the recession; I’m not discussing divorce and remarriage and cohabitation, and I’m not testing hypotheses. (This is a list of research suggestions!) To make the subject more enticing as a research topic (and for accountability), I’ve shared the Census data, Stata code, and spreadsheet file used to make this post in this OSF project. You can use anything there you want. You can also easily fork the project — that is, make a duplicate of its contents, which you then own, and take off on your own trajectory, by adding to or modifying them.

Trends

For some context, here is the trend in percentage of men and women ever married, by age, from 1960. (“Ever married” means currently married, separated, divorced, or widowed.) This clearly shows both life-course delay and lifetime decline, but delay is much more prominent, at least so far. Even now, almost 90% of people have been married by age 60 or so, while the marriage rates for people under 35 have plummeted.

evmar6016

People become ever-married when they get first-married. We measure ever-married prevalence from a survey question on current marital status, but first-marriage incidence requires a question like the American Community Survey asks, “In the past 12 months, did this person get married?” Because they also ask how many times each person has been married, you can calculate a first marriage rate with this ratio:

(once married & married in the past 12 months) / (never married + (once married & married in the past 12 months))

Until recently it hasn’t been easy to measure first-marriage across all ages; now that we have the ACS marital events data (since 2008) we can. This allows us to look at the timing of first marriage, which means we can use current age-specific first-marriage rates to project lifetime ever-married rates under current conditions.

Here are the first-marriage rates for men and women, by age. Each set of bars shows the trend from 2008 to 2016. The left side shows men, by age; the right side shows women, by age; the totals for men and women are in the middle. This shows that first-marriage rates have fallen for men and women under age 35, but increased for those over age 35. The total first-marriage rate has rebounded from the 2013 crater, but is still lower than 2008.

1stmarage

This is a short-range trend, 9 years. It could be recession-specific, with people delaying marriage because of hardships, or relationships falling apart under economic stress, and then hurrying to marry a few years later. But it also fits the long-term trend of delay over decline.

The overall rates for men and women show that the 2014-2016 rebound has not brought first-marriage rates back to their 2008 level. However, what about lifetime odds of marriage? The next figure uses women’s age-specific first-marriage rates to project lifetime odds of marriage for three years: 2008, the 2013 crater, and 2016. This shows, for example, that at 2008 rates 59% of women would have married by age 30, compared with 53% in both 2013 and 2016.

1stmarproj

The 2013 and 2016 lines diverge after age 30, and by age 65 the projected lifetime ever-married rates have fully recovered. This implies that marriage has been delayed, but not forgone (or denied).

Till now I’ve shown age and sex-specific rates, but haven’t addressed other things that might changed in the never-married population. Finally, I estimated logistic regressions predicting first-marriage among never married men and women. The models include race, Hispanic origin, nativity, education, and age. In addition to the year and age patterns above, the models show that all races have lower rates than Whites, Hispanics have lower rates than non-Hispanics, foreign-born people have higher rates (which explains the Hispanic result), and people with more education first-marry more (code and results in the OSF project).

To see whether changes in these other variables change the story, I used the regressions to estimate first-marriage rates at the overall mean of all variables. These show a significant rebound from the bottom, but not returning to 2008 levels, quite similar to the unadjusted trends above:

1stmaradj

This is all consistent with the taper scenario described at the top. Marriage delayed, which reduces the annual marriage rate, but with later marriage picking up much of the slack, so that the decline in lifetime marriage prevalence is modest.


* The refined marriage rate is the number of marriages as a fraction of unmarried people. This is more informative than the crude marriage rate (which the National Center for Health Statistics tracks), which is marriages as a fraction of the total population. In this post I use what I guess you would call an age-specific refined first-marriage rate, defined above.

1 Comment

Filed under Research reports

Stop me before I fake again

In light of the news on social science fraud, I thought it was a good time to report on an experiment I did. I realize my results are startling, and I welcome the bright light of scrutiny that such findings might now attract.

The following information is fake.

An employee training program in a major city promises basic job skills and as well as job search assistance for people with a high school degree and no further education, ages 23-52 in 2012. Due to an unusual staffing practice, new applications were for a period in 2012 allocated at random to one of two caseworkers. One provided the basic services promised but nothing extra. The other embellished his services with extensive coaching on such “soft skills” as “mainstream” speech patterns, appropriate dress for the workplace, and a hard work ethic, among other elements. The program surveyed the participants in 2014 to see what their earnings were in the previous 12 months. The data provided to me does not include any information on response rates, or any information about those who did not respond. And it only includes participants who were employed at least part-time in 2014. Fortunately, the program also recorded which staff member each participant was assigned to.

Since this provides such an excellent opportunity for studying the effects of soft skills training, I think it’s worth publishing despite these obvious weaknesses. To help with the data collection and analysis, I got a grant from Big Neoliberal, a non-partisan foundation.

The data includes 1040 participants, 500 of whom had the bare-bones service and 540 of whom had the soft-skills add-on, which I refer to as the “treatment.” These are the descriptive statistics:

fake-descriptives

As you can see, the treatment group had higher earnings in 2014. The difference in logged annual earnings between the two groups is significant at p

fake-ols-results

As you can see in Model 1, the Black workers in 2014 earned significantly less than the White workers. This gap of .15 logged earnings points, or about 15%, is consistent with previous research on the race wage gap among high school graduates. Model 2 shows that the treatment training apparently was effective, raising earnings about 11%. However, The interactions in Model 3 confirm that the benefits of the treatment were concentrated among the Black workers. The non-Black workers did not receive a significant benefit, and the treatment effect among Black workers basically wiped out the race gap.

The effects are illustrated, with predicted probabilities, in this figure:

fake-marginsplot

Soft skills are awesome.

I have put the data file, in Stata format, here.

Discussion

What would you do if you saw this in a paper or at a conference? Would you suspect it was fake? Why or why not?

I confess I never seriously thought of faking a research study before. In my day coming up in sociology, people didn’t share code and datasets much (it was never compulsory). I always figured if someone was faking they were just changing the numbers on their tables to look better. I assumed this happens to some unknown, and unknowable, extent.

So when I heard about the Lacour & Green scandal, I thought whoever did it was tremendously clever. But when I looked into it more, I thought it was not such rocket science. So I gave it a try.

Details

I downloaded a sample of adults 25-54 from the 2014 ACS via IPUMS, with annual earnings, education, age, sex, race and Hispanic origin. I set the sample parameters to meet the conditions above, and then I applied the treatment, like this:

First, I randomly selected the treatment group:

gen temp = runiform()
gen treatment=0
replace treatment = 1 if temp >= .5
drop temp

Then I generated the basic effect, and the Black interaction effect:

gen effect = rnormal(.08,.05)
gen beffect = rnormal(.15,.05)

Starting with the logged wage variable, lnwage, I added the basic effect to all the treated subjects:

replace newlnwage = lnwage+effect if treatment==1

Then added the Black interaction effect to the treated Black subjects, and subtracted it from the non-treated ones.

replace newlnwage = newlnwage+beffect if (treatment==1 & black==1)
replace newlnwage = newlnwage-beffect if (treatment==0 & black==1)

This isn’t ideal, but when I just added the effect I didn’t have a significant Black deficit in the baseline model, so that seemed fishy.

That’s it. I spent about 20 minutes trying different parameters for the fake effects, trying to get them to seem reasonable. The whole thing took about an hour (not counting the write-up).

I put the complete fake files here: code, data.

Would I get caught for this? What are we going to do about this?

BUSTED UPDATE:

In the comments, ssgrad notices that if you exponentiate (unlog) the incomes, you get a funny list — some are binned at whole numbers, as you would expect from a survey of incomes, and some are random-looking and go out to multiple decimal places. For example, one person reports an even $25,000, and another supposedly reports $25251.37. This wouldn’t show up in the descriptive statistics, but is kind of obvious in a list. Here is a list of people with incomes between $20000 and $26000, broken down by race and treatment status. I rounded to whole numbers because even without the decimal points you can see that the only people who report normal incomes are non-Blacks in the non-treatment group. Busted!

fake-busted-tableSo, that only took a day — with a crowd-sourced team of thousands of social scientists poring over the replication file. Faith in the system restored?

13 Comments

Filed under In the news, Research reports

What is ‘nationally representative,’ and did Regnerus have it?

I’m off to Minneapolis to present a talk tomorrow on “The Regnerus Affair” at the Minnesota Population Center, subtitle: “Gay Marriage, the Supreme Court, and the Politics of Sociology.”

In my preparation, I was putting together notes from previous posts, the critique I co-authored with Andrew Perrin and Neal Caren, the infamous paper itself, and the media coverage of the scandal. And one piece of it I never really questioned got me thinking: his insistence that his dataset was a “a random, nationally-representative sample of the American population.” The news media repeated this assertion routinely, but what does it mean?

The data, collected by Knowledge Networks, are definitely not truly random. But not much is. They have standing panel of participants who get rewards for participating in a certain number of online surveys. The recruitment of the original panel is where the randomness comes in, with dialing (more or less) random phone numbers. But who chooses to be in it is not random, of course. What the firm does, then, is apply weights to the sample. That is, you don’t count each person as one person, you count them as a certain multiple of a person, so that the weighted total sample looks like the target population — in this case all noninstitutionalized American adults ages 18-39.

In the paper, Regnerus offers an appendix which compares his New Family Structures Study to the national population as represented in better, larger samples, such as the Current Population Survey (CPS). He writes:

Appendix A presents a comparison of age-appropriate summary statistics from a variety of socio-demographic variables in the NFSS, alongside the most recent iterations of the Current Population Survey, the National Longitudinal Study of Adolescent Health (Add Health), the National Survey of Family Growth, and the National Study of Youth and Religion—all recent nationally-representative survey efforts. The estimates reported there suggest the NFSS compares very favorably with other nationally-representative datasets.

So, he eyeballs the comparisons and determines the result is “very favorable.” I had previously eyeballed the first few rows of that table and reached the same conclusion. This is the distribution of age, race/ethnicity, region and sex from that table:

nfss comparisonsSo, it looks very similar to the national population as counted by the benchmark CPS. But both of these surveys are weighted on these factors. That is, after the sample is drawn, they change the counts of people to make them match what we know from Census data (which are weighted, too, incidentally). So the fact that NFSS matches CPS on this characteristics just means they did the weights right, so far.

Think about it this way. If I collect data on 6 men and 4 women, it’s easy to call my data “representative” if I weight those 6 men by .83 and the 4 women by 1.25. The more variables you try to match on the harder the math gets, but the principle is the same.

But now I looked further down the table, and Regnerus’s data don’t compare “very favorably” to the national data on some other variables. Here are household income (from CPS) and self-reported health (from the National Survey of Family Growth):

nfss-income

nfss-healthThis means that, when you apply the weights to the NFSS data, which produces comparable distributions on age, sex, race/ethnicity and region, you get a sample that is quite a bit poorer and less healthy than the national average as represented by the better surveys.

I was confused by this partly because according to the Knowledge Networks documentation on the NFSS, income was one of the weighting variables.

I don’t know how big an issue this is. Do you? And do you know of a standard by which a researcher or research firm can declare data “nationally representative” in this age of small, fast, low-response, online surveys?

 

8 Comments

Filed under Research reports