SocArXiv in development


Readers of the blog have become familiar with my complaints about our publishing system (scan the academia tag for examples): it’s needlessly slow, inefficient, hierarchical, profit-driven, exploitative, and also doesn’t work well.

Simple example: a junior scholar sends a perfectly reasonable sociology paper to a high-status journal. The editor commissions three anonymous reviews, and four months later the paper is rejected on the basis of a few hours of their volunteer labor. This increases the value — and subscription price — of the for-profit journal, because its high rejection rate is a key selling point. The author will now revise the paper (some of the advice was good, but nothing to suggest the analysis or conclusions were actually wrong) and send it to another journal, where three more anonymous reviewers — having no access to the previous round of review and exchange — will donate a few hours labor to a different for-profit publisher. In a few months we’ll find out what happens. Repeat. The outcome will be a good paper, improved by the process, published 1-3 years after it was written — during which time the paper, the code, and the data, were not available to anyone else. It will be available for $39.95 to non-academics, but most of the people who are aware of it will be able to read it because their institutions buy it as part of a giant bundle of journals from the publisher. The writer may get a job and, later, tenure. Thus, the process produces a good paper, inaccessible to most of the world, as well as a person dependent on the process, one with the institutional position and incentive to perpetuate it for another generation. There’s more wrong than this, but that’s the basic idea. The system is not completely non-functional, it’s just very bad.

With current technology, replacing our outdated journal system is not difficult. We could save vast amounts of money while providing free, faster access to research for everyone. Like our healthcare system, academic publishing is laboring under the weight of supporting its usurious middlemen. Getting them out of the way is a problem of politics and organization, not technology or cost. We academics do all the work already – research, writing, reviewing, editing – contributing our labor without compensation to giant companies that claim to be helping us get and keep our incredibly privileged jobs. But most of us are supported directly or indirectly by the state and our students (or their banks), not the journal publishers. We don’t need most of what the journal publishers do any more, and working for them is degrading our research, making it less innovative and transformative, less engaging and engaged, less open and accountable.


The people in math and physics developed a workaround for this system in, where people share papers before they are peer-reviewed. Other paper servers have arisen as well, including some run by universities and some run privately for profit, some in specific disciplines. But there is a need for a new general, open-access, open-source, paper server for the social sciences, one that encourages linking and sharing data and code, that serves its research to an open metadata system, and that provides the foundation for a post-publication review system. I hope that SocArXiv will enable us to save research from the journal system. Once its built, anyone will be able to use it to organize their own peer-review community, to select and publish papers (though not exclusively), to review and comment on each other’s work — and to discover, cite, value, and share research unimpeded. We will be able to do this because of the brilliant efforts of the Center for Open Science (which is already developing a new preprint server) and SHARE (“a free, open, data set about research and scholarly activities across their life cycle”).

And we hope you’ll get involved: sharing research, reviewing, moderating, editing, mobilizing. Lots to do, but the good news is we’re doing most of this work already.

SocArXiv won’t take over this blog, though. You can read more about the project, and see the steering committee, in the announcement of our partnership. For updates, you can follow us on Twitter or Facebook, or email to add your name to the mailing list. In fact, you can also make a tax-deductible contribution to SocArXiv through the University of Maryland here.

When your paper is ready, check


Filed under Me @ work

On Asian-American earnings

In a previous post I showed that generalizations about Asian-American incomes often are misleading, as some groups have above-average incomes and some have below-average incomes (also, divorce rates) and that inequality within Asian-American groups was large as well. In this post I briefly expand that to show breakdowns in individual earnings by gender and national-origin group.

The point is basically the same: This category is usually not useful for economic statistics, and should usually be dropped for data on specific groups when possible.

Today’s news

What’s new is a Pew report by Eileen Patten showing trends in race and gender wage gaps. The report isn’t focused on Asian-American earnings, but they stand out in their charts. This led Charles Murray, who is fixated on what he believes is the genetic origin of Asian cognitive superiority, to tweet sarcastically, “Oppose Asian male privilege!” Here is one of Pew’s charts:


The figure, using the Current Population Survey (CPS), shows Asian men earning about 14.5% more per hour than White men, and Asian women earning 11% more than White women. This is not wrong, exactly, but it’s not good information either, as I’ll argue below.

First a note on data

The CPS data is better for some labor force questions (including wages) than the American Community Survey, which is much larger. However, it’s too small a sample to get into detail on Asian subgroups (notice the Pew report doesn’t mention American Indians, an even smaller group). To do that I will need to activate the ACS, which is better for race/ethnic detail.

As a reminder, this is the “race” question on the 2014 American Community Survey, which I use for this post:


There is no “Asian” or “Pacific Islander” box to check. So what do you do if you are thinking, “I’m Asian, what do I check?” The question is premised on that assumption that is not what you’re thinking. Instead, you choose from a list of national origins, which the Census Bureau then combines to make “Asian” (the first 7 boxes) and “Pacific Islander” (the last 3) categories. And you can check as many as you like, which is good because there’s a lot of intermarriage among Asians, and between Asians and other groups (mostly Whites). This is a lot like the Hispanic origin question, which also lists national origins — except that question is prefaced by the unifying phrase, “Is Person 1 of Hispanic, Latino, or Spanish origin?” before listing the options, each beginning with “Yes”, as in “Yes, Cuban.”

Although changes have not been announced, it is likely that future questions will combine the race and Hispanic-origin questions, and also preface the Asian categories with the umbrella term. This may mark the progress of getting Asian immigrants to internalize the American racial classification system, so that descendants from groups that in some cases have centuries-old cultural differentiation start to identify and label themselves as from the same racial group (who would have put Pakistanis and Japanese in the same “race” group 100 years ago?). It’s hard to make this progress, naturally, when so many people from these groups are immigrants — in my sample below, for example, 75% of the full-time, year-round workers are foreign-born.


The problem with the earnings chart Pew posted, and which Charles Murray loved, is that it lumps all the different Asian-origin groups together. That is not crazy but it’s not really good. Of course every group has diversity within it, so any category masks differences, but in my opinion this Asian grouping is worse in that regard than most. If someone argued that all these groups see themselves as united under a common identity that would push me in the direction of dropping this complaint. In any event, the diversity is interesting even if you don’t object to the Pew/Census grouping.

Here are two breakouts. The first is immigration. As I noted, 75% of the full-time, year-round workers (excluding self-employed people, like Pew does) with an Asian/Pacific Islander (Asian for short) racial identification are foreign born. That ranges from less than 4% for Hawaiians, to around 20% for the White+Asian multiple-race people, to more than 90% for Asian Indian men. It turns out that the wage advantage is mostly concentrated among these immigrants. Here is a replication of the Pew chart using the ACS data (a little different because I had to use FTFY workers), using the same colors. On the left is their chart, on the right is the same data limited to US-born workers.


Among the US-born workers the Asian male advantage is reduced from 14.5% to 4.2% (the women’s advantage is not much changed; as in Pew’s chart, Hispanics are a mutually exclusive category.) There are some very high-earning Asian immigrants, especially Indians. Here are the breakdowns, by gender, comparing each of the larger Asian-American groups to Whites:


Seven groups of men and nine groups of women have hourly earnings higher than Whites’, while nine groups of men and seven groups have women have lower earnings. In fact, among Laotians, Hawaiians, and Hmong, even the men earn less than White women. (Note, in my old post, I showed that Asian household incomes are not as high as they look when they are compared instead with those of their local peers, because they are concentrated in expensive metropolitan markets.)

Sometimes when I have a situation like this I just drop the relatively small, complex group, which leads some people to accuse me of trying to skew results. (For example, I might show a chart that has Blacks in the worst position, even though American Indians have it even worse.)

But generalization has consequences, so we should use it judiciously. In most cases “Asian” doesn’t work well. It may make more sense to group people by regions, such as East-, South-, and Southeast Asia, and/or according to immigrant status.


Filed under In the news

Explain to me again how marriage is the problem here

This is one of those things you share with all your friends on social media.


Black married parents are 2.4-times more likely to be in poverty, are 2.1-times more likely to be unemployed, and have one-ninth the median net worth compared with White married parents. So explain to me again how marriage is the problem here.


The other day I picked on someone’s fact meme, and wondered what makes these things work, without offering a constructive alternative. I can’t answer the question I asked in that post (how old are the fathers of teen mothers’ children?), but I can answer some other questions about families and Black-White inequality. So that’s what I did.

Feel free to take these facts (or any others) and make something better.


Here are my sources:

Poverty: 2014 American Community Survey from It’s Black and White, non-Hispanic, householders who are married and have their own children in the household. The poverty rates were 5% for White married parents and 11.9% for Black married parents. The poverty variable goes from 0 to 501, with 0-99 being below the poverty line, so you specify the recode like this: poverty(r:0-99 “poor”; 100-501 “not poor”). Here’s how you fill out the boxes in the online analysis tool:


Unemployment: Again, 2014 American Community Survey from It’s Black and White, non-Hispanic, householders who are married and have their own children in the household. For this one you limit it to people in the labor force (empstat(1-2)) to get the unemployment rate. I did it for men and women combined, getting unemployment rates of 3.1% for White married parents and 6.6% for Black married parents. The numbers are higher for women (3.7% versus 7.3%) but the Black/White ratio is a little worse for men (2.6% versus 5.8%). Here’s how:


Median net worth: I used the Survey of Consumer Finances from 2013, available here. These are also non-Hispanic Black and White parents living with children. The median net worths were $150,500 for Whites and $16,000 for Blacks (Hispanics, incidentally, have $18,750, and the rest are just coded “other”). This data set combines married people with those who are “living with partner,” so this comparison includes cohabitors. (I don’t know how that affects the results, but I’m sure there’s still lots of inequality.) I put my STATA code in an Open Science Framework project here, so feel free to play with it yourself.

1 Comment

Filed under In the news

The fathers behind teen births (or, statistical memes and motivated blind trust)

When makes people trust statistical memes? I don’t know of any research on this, but it looks like the recipe includes a combination of scientific-sounding specificity, good graphics, a source that looks credible, and – of course – a number that supports what people already believe (and want their Facebook friends to believe, too).

If that’s the problem, and assuming the market can’t figure out how to make journalism work, I have no solution except seizing the Internet and putting it under control of the Minister of Sociology, or, barring that, encouraging social scientists to get engaged, help reporters, and make all their good work available publicly, free, and fast.

Today’s cringe:


The blogger TeenMomNYC takes credit for creating this, and the Facebook version has been shared tens of thousands of times. Its popularity led to this story from Attn: “The Truth About Teenage ‘Baby Mamas’ is Quite Revealing.” (If anyone did want to study this issue, this is a neat case study, because she posted 8 “did you know” graphics on Facebook at the same time, and none of the others took off at all – why?)

I don’t know anything about TeenMomNYC, but I share her desire to stop stigmatizing and shaming young mothers. I wish her work were not necessary, but I applaud the effort. That said, I don’t necessarily think shaming young fathers (even if they’re not quite as young) is a solution to that, but that’s not the point. My point is, what is this statistic?

According to the footnote (thanks!), it comes from this 1995 National Academies report, and (except for changing “29” to “29.7”) it represents it accurately. From p. 205:

These data highlight an additional component of the sexual abuse picture— the evidence that an appreciable portion of the sexual relationships and resulting pregnancies of young adolescent girls are with older males, not peers. For example, using 1988 data from the NSFG and The Alan Guttmacher Institute, Glei (1994) has estimated that among girls who were mothers by the age of 15, 39 percent of the fathers were ages 20–29; for girls who had given birth to a child by age 17, the comparable figure was 53 percent. Although there are no data to measure what portion of such relationships include sexual coercion or violence, the significant age difference suggests an unequal power balance between the parties, which in turn could set the stage for less than voluntary sexual activity. As was recently said at a public meeting on teen pregnancy, “can you really call an unsupervised outing between a 13-year-old girl and a 24-year-old man a ‘date’?”

This is an important point, and was good information in 1995, when it cited a 1994 analysis of 1988 data, which asked women ages 15-44 a retrospective question. In other words, this refers to births that took place as early as 1958, or between 28 and 58 years ago. That is historical, and really shouldn’t be used like this today, given how much has changed regarding teen births.

The analysis is of the 1988 National Survey of Family Growth, a survey that was repeated as recently as 2011-2013. Someone who knows how to use NSFG should figure out the current state of the age gap between young mothers and fathers and let TeenMomNYC know.

Even if I didn’t know the true, current statistic, this would give me pause. Births to women before age 15 are extremely rare. The American Community Survey, which asks millions of women whether they have had a birth in the previous year, does not even ask the question of women younger than 15. The ACS reports there were 179,000 births in the previous year among women who were under 20 when interviewed, of which only 6,500 were to women age 15 at the interview. So that’s 3.7% of teen births, and 3 out of every thousand 15-year-old women. In 1958 this was much more common, and the social environment was much different.

Another issue is the age range of the fathers, 20-29, which is very wide when dealing with such young mothers. Look at the next phrase from the 1995 report: “girls who had given birth to a child by age 17, the comparable figure was 53 percent.” Realize that the great majority of girls who had a birth “by age 17” were 17 when they did, and the great majority of those men were probably close to 20. I’m not very positive about 20-year-old men having children with 17-year-old women, but it’s pretty different from 29-versus-13.

I can’t find the original source for this, but this report from the Resource Center for Adolescent Pregnancy Protection attributes this table to the California Center for Health Statistics in 2002, which shows that the father was age 20 or older  for 23% of women who had a birth before age 15. And of those, 93% were 20-24 (rather than 25+).


Anyway, this is a good case of a well-intentioned but under-resourced effort to sway people with true information, picked up by click-bait media and repeated because people think it will help them win arguments, not because they have any real reason to believe it’s true (or not true).

So I really hope someone with the resources, skills, and training to answer this question will produce the real numbers regarding father’s age for teen births, and post them, with accompanying non-technical language, along with their code, on the Open Science Framework (or other open-access repository).

Fixing the media and its economy is a tall order, but academics can do better if we put our energy into this work, reward it, and restructure our own system so that good information gets out better, faster and more reliably.

Related posts:


Filed under In the news

Black women really do have high college enrollment rates (at age 25+)

The other day I reported on the completely incorrect meme that Black women are the “most educated group” in the U.S. That was a simple misreading of a percentage term on an old table of degree attainment, which was picked up by dozens of news-repeater websites. Too many writers/copiers and editors/selectors don’t know how to read or interpret social statistics, so this kind of thing happens when the story is just too good to pass up.

I ignored another part of those stories, which was the claim that Black women have the highest college enrollment rates, too. This is more complicated, and the repeated misrepresentation is more understandable.

Asha Parker in Salon wrote:

By both race and gender there is a higher percentage of black women (9.7 percent) enrolled in college than any other group including Asian women (8.7 percent), white women (7.1 percent) and white men (6.1 percent), according to the 2011 U.S. Census Bureau.

You know the rewrite journalists are playing telephone when they all cite the same out-of-date statistics. (That Census report comes out every year — here’s the 2014 version; pro-tip: with government reports, try changing the year in the URL as a shortcut to the latest version.)

But is that true? Sort of. Here I have to blame the Census Bureau a little, because on that table they do show those numbers, but what they don’t say is that 9.7% (in the case of Black women) is the percentage of all Black “women” age 3 or older who are attending college. On that same table you can see that about 2% of Black “women” are attending nursery school or kindergarten; more relevant, probably, is the attendance rate for those ages 3-4, which is 59%.

So it’s sort of true. Particularly odd on that table is the low overall college attendance rate of Asian women, who are far and away the most likely to go to college at the “traditional” college ages of 18-24. That’s because they are disproportionately over age 25 (partly because many have immigrated as adults). But, if you just limit the population to those ages 18-54, Black women still have the highest enrollment rates: 15.5%, compared with 14.6% for Asians, 12.6% for Hispanics, and 12.4% for Whites. Asians are just the most likely to be over 25 and not attending college, most of them having graduated college already.

This does not diminish the importance of high enrollment rates for Black women, which are real — after age 25; the pattern is interesting and important. Here it is:


Under age 25, Black women are the least likely to be in college, over 25 they’re the most likely. This really may say something about Black women’s resilience and determination, but it is not a feel-good story of barriers overcome and opportunity achieved. And, despite her presence in the videos and stories illustrating this meme, it is not the story of Michelle Obama, who had a law degree from Harvard at age 24.

This is part of a pattern in which family events are arrayed differently across the life course for different race/ethnic groups, and the White standard is often mistaken as universal. I have noted this before with regard to marriage (with more Black women marrying at later ages) and infant mortality (which Black women facing the lowest risk of infant death when they have children young). It’s worth looking at more systematically.

ADDENDUM 6/29/2016: Cumulative projected years of higher education

If you take the proportion of women enrolled in each age group, multiply it by the years if the age group (so, for example, 18-19 is two years), and sum up those products, you can get a projected total years in college (including graduate school) for each group of women. It looks like this:


Note this makes the unreasonable assumption that everyone who says they are enrolled in college in an October survey attends college for a full year. So, for example, Asian women are projected to spend 6.2 years in college on average between ages 18 and 54. What’s interesting here is that Black women are projected to spend more years in higher education than White women (5.5 versus 4.9). But we know they are much less likely than White women to end up with a bachelor’s degree (currently 23% versus 33%). This has to be some combination of Black women not spending full years in college, not going to school full time, or not completing bachelor’s degrees after however many years in school. Attendance may be an indicator of resilience or determination, but it’s not as good an indicator of success.


Filed under In the news

Life table says divorce rate is 52.7%

After the eternal bliss, there are two ways out of marriage: divorce or death.

I have posted my code and calculations for divorce rates using the 2010-2012 American Community Survey as an Open Science Framework project. The files there should be enough to get you started if you want to make multiple-decrement life tables for divorce or other things.

Because the American Community survey records year of marriage, and divorce and widowhood, it’s perfectly set up for a multiple-decrement life table approach. A multiple-decrement life table uses the rate of each of two exits for each year of the original state (in this case marriage), to project the probability of either exit happening at or after a given year of marriage. It’s a projection of current rates, not a prediction of what will happen. So, if you write a headline that says, “your chance of divorce if you marry today is 52.7%,” that would be too strong, because it doesn’t take into account that the world might change. Also, people are different.

The divorce rate of 52.7% can accurately be described like this: “If current divorce and widowhood rates remain unchanged, 52.7% of today’s marriages would end in divorce before widowhood.” Here is a figure showing the probability of divorce at or after each year of the model:


So there’s 52.7% up at year 0. Marriages that make it to year 15 have a 30% chance of eventually divorcing, and so on.

Because the ACS doesn’t record anything about the spouses of divorce or widowed people, I don’t know who was married to whom, such as age, education, race-ethnicity, or even the sex of the spouse. So the estimates differ by sex as well as other characteristics. I estimated a bunch of them in the spreadsheet file on the OSF site, but here are the bottom lines, showing, for example, that second or higher-order marriages have a 58.5% projected divorce rate and Blacks have a 64.2% divorce rate, compared with 52.9% for Whites.


(The education ones should be taken with a grain of salt because education levels can change but this assumes they’re static.)

Check the divorce tag for other posts and papers on divorce.

The ASA-style citation to the OSF project would be like this:  Cohen, Philip N. 2016. “Multiple-Decrement Life Table Estimates of Divorce Rates.” Retrieved (


Filed under Me @ work

No Black women are not the “most educated” group in the US

I don’t know where this started, but it doesn’t seem to be stopping. The following headlines are all completely factually wrong, and the organizations that published them should correct them right away:

The Root: Black Women Now the Most Educated Group in US

Upworthy: Black women are now America’s most educated group

SalonBlack women are now the most educated group in the United States

GoodBlack Women Are Now The Most Educated Group In The U.S.

And then the video, by ATTN:, on Facebook, with 6 million views so far. I won’t embed the video here, but it includes these images, with completely wrong facts:



What’s true is that Black women, in the 2009-2010 academic year, received a higher percentage of degrees within their race/ethnic group than did women in any other major group. So, for example, of all the MA degrees awarded to Black students, Black women got 71% of them. In comparison, White women only got 62% of all White MA degrees. Here is the chart, from the data that everyone linked to (which is not new data, by the way, and has nothing to do with 2015):


For Black women to be the “most educated group,” they would have to have more degrees per person than other groups. In fact, although a greater percentage of Black women have degrees than Black men do, they have less education on average than White women, White men, Asian/Pacific Islander women, and Asian/Pacific Islander men.

Here are the percentages of each group that holds a BA degree or higher (ages 25-54), according to the 2010-2014 American Community Survey, with Black women highlighted:


23% of Black women ages 25-54 have BA degrees or more education, compared with 38% of White women. This does not mean Black women are worse (or that White women are better). It’s just the actual fact. Here are the percentages for PhD degrees:


Just over half of 1% of Black women have PhDs, compared with just over 1% of White women – and almost 3% of Asian/PI women. White women are almost twice as likely to have a PhD and Black women, Asian/PI women are more than 5-times as likely.

Racism is racism, inequality is inequality, facts are facts. Saying this doesn’t make me racist or not racist, and it doesn’t change the situation of Black women, who are absolutely undervalued in America in all kinds of ways (and one of those ways is that they don’t have the same educational opportunities as other groups). There are some facts in these stories that are true, too. And of course, why Black women (and women in general) are getting more degrees than men are is an important question. But please don’t think it’s my responsibility to research and present all this information correctly before it’s appropriate for me to point out the obvious inaccuracy here. You don’t need this meme to do the good you’re trying to do by sharing these stories.

Our current information economy rewards speed and clickability. Journalists who know what they’re doing are more expensive and slower. Making good graphics and funny GIFs is a good skill, but it’s a different skill than interpreting and presenting information. We can each help a little by pausing before we share. And those of us with the skills and training to track these things down should all pitch in and do some debunking once in a while. For academics, there is little extra reward in this (as evidenced by my most recent, sup-par departmental “merit” review), beyond the rewards we already get for our cushy jobs, but it should be part of our mission.


Filed under In the news