COVID-19 mortality rates by race/ethnicity and age

Why are there such great disparities in COVID-19 deaths across race/ethnic groups in the U.S.? Here’s a recent review from New York City:

The racial/ethnic disparities in COVID-related mortality may be explained by increased risk of disease because of difficulty engaging in social distancing because of crowding and occupation, and increased disease severity because of reduced access to health care, delay in seeking care, or receipt of care in low-resourced settings. Another explanation may be the higher rates of hypertension, diabetes, obesity, and chronic kidney disease among Black and Hispanic populations, all of which worsen outcomes. The role of comorbidity in explaining racial/ethnic disparities in hospitalization and mortality has been investigated in only 1 study, which did not include Hispanic patients. Although poverty, low educational attainment, and residence in areas with high densities of Black and Hispanic populations are associated with higher hospitalizations and COVID-19–related deaths in NYC, the effect of neighborhood socioeconomic status on likelihood of hospitalization, severity of illness, and death is unknown. COVID-19–related outcomes in Asian patients have also been incompletely explored.

The analysis, interestingly, found that Black and Hispanic patients in New York City, once hospitalized, were less likely to die than White patients were. Lots of complicated issues here, but some combination of exposure through conditions of work, transportation, and residence; existing health conditions; and access to and quality of care. My question is more basic, though: What are the age-specific mortality rates by race/ethnicity?

Start tangent on why age-specific comparisons are important. In demography, breaking things down by age is a basic first-pass statistical control. Age isn’t inherently the most important variable, but (1) so many things are so strongly affected by age, (2) so many groups differ greatly in their age compositions, and (3) age is so straightforward to measure, that it’s often the most reasonable first cut when comparison groups. Very frequently we find that a simple comparison is reversed when age is controlled. Consider a classic example: mortality in a richer country (USA) versus a poorer country (Jordan). People in the USA live four years longer, on average, but Americans are more than twice as likely to die each year (9 per 1,000 versus 4 per 1000). The difference is age: 23% of Americans are over age 60, compared with 6% of Jordanians. More old people means more total deaths, but compare within age groups and Americans are less likely to die. A simple separation by age facilitates more meaningful comparison for most purposes. So that’s how I want to compare COVID-19 mortality across race/ethnic groups in the USA. End tangent.

Age-specific mortality rates

It seems like this should be easier, but I can’t find anyone who is publishing them on an ongoing basis. The Centers for Disease Control posts a weekly data file of COVID-19 deaths by age and race/ethnicity, but they do not include the population denominators that you need to calculate mortality rates. So, for example, it tells you that as of December 5 there have been 2,937 COVID-19 deaths among non-Hispanic Blacks in the age range 30-49, compared with 2,186 deaths among non-Hispanic Whites of the same age. So, a higher count of Black deaths. But it doesn’t tell you there are 4.3-times as many Whites as Blacks in that category. So a much higher mortality rate.

On a different page, they report the percentage of all deaths in each age range that have occurred in each race/ethnic group, don’t include their percentage in the population. So, for example, 36% of the people ages 30-39 who have died from COVID-19 were Hispanic, and 24% were non-Hispanic White, but that’s not enough information to calculate mortality rates either. I have no reason to think this is nefarious, but it’s clearly not adequate.

So I went to the 2019 American Community Survey (ACS) data distributed by to get some denominators. These are a little messy for two main reasons. First, ACS is a survey that asks people what their race and ethnicity are, while death counts are based on death certificates, for which the person who has died is not available to ask. So some people will be identified with a different group when they die than they would if they were surveyed. Second, the ACS and other surveys allow people to specify multiple races (in addition to being Hispanic or not), whereas death certificate data generally does not. So if someone who identifies as Black-and-White on a survey dies, how will the death certificate read? (If you’re very interested, here’s a report on the accuracy of death certificates, and here are the “bridges” they use to try to mash up multiple-race and single-race categories.)

My solution to this is make denominators more or less the way race/ethnicity was defined before multiple race identification was allowed. I put all Hispanic people, regardless of race, into the Hispanic group. Then I put people who are White, non-Hispanic, and no other race into the White category. And then for the Black, Asian, and American Indian categories, I include people who were multiple race (and not Hispanic). So, for example, a Black-White non-Hispanic person is counted as Black. A Black-Asian non-Hispanic person is counted as both Black and Asian. Note I did also do the calculations for Native Hawaiian and Other Pacific Islanders, but those numbers are very small so I’m not showing them on the graph; they’re on the spreadsheet. Note also I say “American Indian” to include all those who are “non-Hispanic American Indian or Alaska Native.”

This is admittedly crude, but I suggest that you trust me that it’s probably OK. (Probably OK, that is, especially for Whites, Blacks, and Hispanics. American Indians and Asians have higher rates of multiple-race identification among the living, so I expect there would be more slippage there.)

Anyway, here’s the absolutely egregious result:

This figure allows race/ethnicity comparisons within the five age groups (under 30 isn’t shown). It reveals that the greatest age-specific disparities are actually at the younger ages. In the range 30-49, Blacks are 5.6-times more likely to die, and Hispanics are 6.6-times more likely to die, than non-Hispanic Whites are. In the oldest age group, over 85, where death rates for everyone are highest, the disparities are only 1.5- and 1.4-to-1 respectively.

Whatever the cause of these disparities, this is just the bottom line, which matters. Please note how very high these rates are at old ages. These are deaths per 100,000, which means that over age 85, 1.8% of all African Americans have died of COVID-19 this year (and 1.7% for Hispanics and 1.2% for Whites). That is — I keep trying to find words to convey the power of these numbers — one out of every 56 African Americans over age 85.

Please stay home if you can.

A spreadsheet file with the data, calculations, and figure, is here:

About Charles Murray: Is a White man’s cross burning as disqualifying as blackface?

“People are saying” that we need to think about how to interpret, and possibly punish, past racism, relative to current racism. This is as much about the meaning of “past” as it is about the meaning of “racism.” It’s about individual suspected racists — specifically leading Virginia Democrats — and about the intersection of individual and institutional racism, as preserved and displayed in yearbooks, as in this photo of the University of Illinois KKK chapter in 1924, which included representation from each fraternity on campus:

Politicians are a special case, because their authority is in theory dependent on the legitimating consent of the governed. On the other hand are regular individuals, for whom being labeled a racist is among the harshest reputational penalties we have. More important than individuals is how they add up to groups, organizations, and institutions.

Then there are powerful individuals representing institutional interests, such as Charles Murray, who spent decades on the dole of non-profit organizations funded by the foundations of the rich (in other words, you). He built an extremely influential career blaming poverty on inborn deficiencies (“born lazy“) among the Black poor and providing scientific cover for dismantling government support for meeting their needs.

Why burn that cross

In the grand scheme maybe it doesn’t matter whether Charles Murray (now an emeritus at age 76) is, or was, racist in his heart — his work was racist in its effects (White supremacist terrorist Dylann Roof parroted Murray in his rationale for murdering Black people in church.) However, he and his defenders have always impugned those who assign racist motives to his work. He clearly believes in a biological racial hierarchy in genetic intelligence, which is an old-fashioned definition of racism. The new scientific racists, a coalition that includes Murray, defends itself from that charge by claiming it’s not racist if it’s true, and it has fallen to human geneticists to debunk their claims. The charge of racism has always weakened the legitimacy of Murray and his compatriots, and narrowed their reach. As I think it should — you don’t need to know what was in his heart to think his work was terrible, but it’s relevant.

Shawn Fremstad reminded me that Murray and his friends burned a cross in 1960, which seems like a good thing to dredge up during racist-yearbook week. Here is the very cursory story, in a 1994 New York Times profile for the release of his book The Bell Curve.

While there is much to admire about the industry and inquisitiveness of Murray’s teen-age years, there is at least one adventure that he understandably deletes from the story — the night he helped his friends burn a cross. They had formed a kind of good guys’ gang, “the Mallows,” whose very name, from marshmallows, was a play on their own softness. In the fall of 1960, during their senior year, they nailed some scrap wood into a cross, adorned it with fireworks and set it ablaze on a hill beside the police station, with marshmallows scattered as a calling card.

[Denny] Rutledge recalls his astonishment the next day when the talk turned to racial persecution in a town with two black families. “There wouldn’t have been a racist thought in our simple-minded minds,” he says. “That’s how unaware we were.”

A long pause follows when Murray is reminded of the event. “Incredibly, incredibly dumb,” he says. “But it never crossed our minds that this had any larger significance. And I look back on that and say, ‘How on earth could we be so oblivious?’ I guess it says something about that day and age that it didn’t cross our minds.”

This is a very incomplete story, which doesn’t even tell us who first told the tale of the cross burning, or what reason that person gave for it, or how they picked the location. But reading this, my sociological opinion is that “dumb” is likely a dodge; and my sociological question is, if they had no idea of the “larger significance” of cross burning, in 1960, why do it? There were lots of dumb things to do. My sociological approach to this question is to investigate the context in which this cross burning occurred, both in the social environment and in Murray’s life course trajectory.

The fall of 1960, the beginning of Murray’s senior year of high school, was when he would have been applying to Harvard, which he went off to in 1961 (he was a history major). It was also a time when cross burning was in the news a lot, including in Iowa.

The 1960 Census recorded 15,000 people in the idyllic cross-burning town of Newton, where Murray’s father was a Maytag executive. And there were only 22 Black people recorded in Jasper county (where Newton is the principal city). Does this mean race was not an issue in the minds of Murray’s gang? I’m very doubtful. Blacks were a noticeable, and noticeably growing, presence in Iowa cities, including Des Moines, just 30 miles from Newton. (The new Interstate 80 hadn’t connected Newton and Des Moines yet, but sections of it were already built west of Des Moines, and it was penciled in on the map.) During the 1950s the state’s nonwhite population increased about 70%, from 17,000 to 29,000. In fact, the 1950s were the biggest decade for Black migration to Iowa. Almost all of them lived in urban areas, including Des Moines. The city had 209,000 people, of which 10,700 (5%) were nonwhite (mostly Black) by 1960.

So, do you think a 1960 White executive’s family would have heard anything about the nonwhite population of the nearest city nearly doubling in the previous decade? Did aw-shucks Murray and his pool hall buddies know about all that big city stuff?

We have some other evidence from which to speculate. Murray traveled around the state, and even the country, in his high school years. He was on Newton High School’s “Crack Debate Team” that won several statewide tournaments, including one at the University of Iowa in Iowa City in April 1960. And that summer the debate team roadtripped to California, courtesy of the Chamber of Commerce, for a national tournament. (What did they debate, anyway?)

Picture of Newton debate team, including Charles Murray, in 1960Des Moine Register, June 15, 1960.

So in 1960 Murray was the son of an executive, and a debate team champion, traveling the state and country, and applying to Harvard, while living in the next county over from a city with a booming Black population. Oh, and it was 1960: the year civil rights protesters staged sit-ins in dozens of cities across the south, from February through April.

By my count there were 55 articles in the Des Moines Register/Tribune archives mentioning cross burning during his high school years, 1955 to 1960. In fact, there were a number of stories about an Iowa City incident, where in April 1960 (yes, that April 1960), eight Beta Theta Pi frat brothers burned a cross on the lawn of the assistant director of student affairs, whose office was “instrumental in the effort to remove race restrictions from the constitutions of several fraternities at the university.” After briefly suspending the men, the university declared it a “prank” and reinstated them on probation:

Clips from Chicago Daily Tribune and Des Moines Tribune, April and May 1960

Maybe it was a pure coincidence having nothing to do with race that the eight frat brothers burned a cross in their “prank.” But why a cross? Also, it was a few weeks after students picketed stores right there in Iowa City to support the sit-ins.

washington times herald article showing rash of cross burnings in south, and mentioning picketers supporting sit-ins in iowa city.

I see a possible parallel between the frat boys and the cross burning by Murray’s marshmallow gang. The story is they had no idea it was about race; decades later, this is the story they recite. Some key White adults helped keep the narrative from getting out of hand. I’d bet the incident didn’t make it into Murray’s Harvard admissions packet, either in his personal essay or in the form of a criminal record. Even though there was “talk” in town the next day.

And they went on about their lives. Murray isn’t an elected office holder, and may be retired. Maybe it’s water under the racist.

Incidentally, I noticed that one of the University of Iowa cross-burning frat boys, Joel E. Swanson, seems to have gone on to become a state district court judge. (I don’t know what happened to their disorderly conduct charge.) He was a freshman in 1960, got his law degree at the University in 1967, while serving in the National Guard, and worked as a lawyer in his home town of Lake City, eventually became a judge and then retiring in 2012. Also, they have recipes.


Teaching Black family history in sociology, student resistance edition

There is an amazing story from a family sociology class at the University of Tennessee. I don’t know the whole chronology of the reports, but I read pieces from As It Happens, BET, the local news. The gist of it is that there was an ambiguous quiz question about Black slave families, and when a Black student named Kayla Renee Parker complained, it led to her making a rebuttal presentation to the class, and then the White instructor, Judy Morelock, going on an abusive, racist social media rant and getting fired.

Before the details, my conclusions:

  • Good test questions are important, and as a teacher it’s OK to admit you’re wrong or there is ambiguity.
  • Two things are true: Black families were devastated by slavery and as a generalization most Black children under slavery lived with both parents.
  • There is a line, but not a straight line, between Black families under slavery and those under today’s system of racial domination.
  • Students who do research, honestly engage the material, and bring passionate or political arguments to class should have their courage and commitment encouraged, not punished.
  • Some White people who say they are against racism, and maybe even are against racism, are also racist and hate students.
  • Social media is public, so expect consequences.

The story, and then my approach, follows.

The quiz

Here is the question at issue:

Historical research on African-American families during slavery shows that:

A) Family ties weren’t important in African cultures where the slaves ancestors originated; consequently, family bonds were never strong among slaves.

B) Two-parent families were extremely rare during the slave period.

C) Black family bonds were destroyed by the abuses of slave owners, who regularly sold off family members to other slave owners.

D) Most slave families were headed by two parents.

Parker chose C, but Morelock said the correct answer is D. In a back and forth that Parker put on her Facebook page, she pointed out that the textbook talked about “disruption of families through sale of family members,” and Morelock countered that “bonds were maintained among family members who were geographically separated” referring to people passing information between plantations. These are long-running and unsettled issues in the historical scholarship. If you revise answer C to read “bonds were often destroyed” then it is obviously true. If you take a legalistic approach you could say, “family bonds were destroyed” means all bonds, so C is incorrect. This is not a good argument for a teacher to have. Correct the ambiguity, figure out how to handle the points, take it as a teaching opportunity, and move on.

In fact, there appears to have been one good outcome, which was Parker making a very good presentation to the class (video in the As It Happens story). If that was the end of it, we never would have heard. Maybe it’s good that it wasn’t the end of it, though, because when Morelock’s Facebook posts came out we might agree it’s just as well that the incident led to her being fired. The posts are in the BET story, and include Morelock calling Parker (thought not naming her), “ignorant simple-minded,” and threatening to ruin her reputation after the end of the semester, specifically saying, “I will post her name, her picture, and her bio on Facebook, Twitter, Instagram, and Linkedin. Count on it.” Wow. (She also says Parker was spreading “venomous rumors” about her, which I don’t see reported.)

Many teachers complain about their students on Facebook. If you have reasonable complaints, don’t compromise their identities, don’t reveal or advocate unprofessional or vindictive behavior, and don’t be really racist, I think this is ethically defensible. It’s like a teaching workshop, or talking about your job in the staff lounge. But it’s risky and if you screw up you can get fired (which might or might not be a good thing).

The key thing is always, “If there was a hidden camera here or someone hacked my account, would I be able to defend my behavior?” If the answer is yes, you might still be taking a risk to talk about students, but at least you can live with yourself.

Anyway, as far as what I see in the classroom video and Facebook post of her email exchange, I have nothing but kudos for Parker although I might argue with her a little, too. If she did bad things elsewhere, she shouldn’t have.

Classroom exchange

In Parker’s presentation, she quotes Frederick Douglass saying it was “common custom” where he was born “to part children from their mothers from a very early age.” This is good evidence in favor of Answer C. Obviously experiences varied dramatically across the slave system and over time. Throwing down over a generalization like “most” is not really worth it.

She added, “We continue to see those impacts today and that’s why I believe that family bonds were destroyed.” She says Morelock told her she can’t teach by anecdotes, and she countered that we have to pay attention to the stories of real people affected. This is a really good argument to have, in theory.

Parker recommends The New Jim Crow, and Slavery by Another Name, and she says of the present “it’s by a different name, it’s still slavery in itself. … Slavery is still continuing to destroy the Black family” because of the “prison industrial complex.” She cites an article by Rose Brewer, “Black Families Imperiled by Growth of Nation’s Prisons Industrial Complex.”

Finally Parker says Morelock recommended some books, one of which was a 1998 edition of Minority Families in the United States, by Ronald Taylor, which she said was good but should be more current.

It’s really an excellent presentation. If you care about educating students, this would make you happy (again, not knowing what else may have happened off camera). At the end Parker takes questions, and Morelock pipes up, saying in part (my transcript):

I don’t have a lot of recent books, because the publishers just don’t send us books the way they used to. And I’ve been using [Andrew] Cherlin [Public and Private Families] for many, many years, the book you have in this course. He says the same thing, and that book is in its seventh edition. If there had been additional sociological research since he wrote that book I would think that it would appear in it, but it doesn’t. So I have to go by what my discipline shows, and I understand no matter how much I revere and respect a historical figure like Frederick Douglass, who was absolutely one of the bravest, most articulate persons of his generation, and highly respected, I still have to go with what has been done systematically, the kind of systematic methods that did not exist at that time, when sociology was still in its infancy. So, in the 70s, you know, the research that was done, with historical documents, on Black families demonstrated that people forged bonds, this is written by sociologist Ronald Taylor, he also happens to be African American, I don’t think he would try to minimize the effects of slavery, which I never ever ever would myself, and he talks about studies here [she quotes Taylor on the strong bonds in Black families, and how they maintained them even when they were separated] … Nonetheless, as I said, no one has to accept the sociological point of view. All students in my class, as is always the case, are free to make up their own minds, in fact I encourage it, and I always encourage you to do as Kayla did, do more research, find out more information about a topic, and come to your own conclusions.

Aside from the giant red flag of calling Frederick Douglass “articulate,” this is a reasonable argument. Although it’s sad that Morelock doesn’t keep up with the literature, and her reliance on authority rather than reason and analysis is bad, the truth is her facts are pretty current. Even though she’s racist, it’s not her take on the history that makes her racist. The prison industrial complex is important but it’s not the same thing as slavery breaking up families, it’s a different but related thing. (Incidentally, Cherlin has a good newer book about working class families that addresses some of this; my review is here.)

It’s not surprising we’ve been arguing about this for a century or so. It’s complicated. Here is the trend, back to 1880, in the proportion of Black children ages 0-14 living with married parents. There are issues with the data and measurement, but this basic pattern holds: the share of Black children living with two married parents increased after the end of slavery, and fell a lot more later:

black children married parents 1880-2015

Of course, some students would also get mad if you said, “slavery destroyed all Black families,” which isn’t true either. I don’t agree with the first part of the BET headline, “Professor Denies Slavery Destroyed Black Families And Threatens Student Who Called Her Out,” but because the second part is true I have no interest in defending her.

My version

Anyone who teaches this material should wrestle with this. Here’s what I have in the first edition of my book, in the history chapter (there is much more current material in the subsequent chapter on race and ethnicity). I would be happy to hear your response to this:

Families Enslaved

African families had gone through their own transitions, of course, of a particularly devastating nature. From the arrival of the first slaves in Jamestown in 1619 until the mid-1800s, Africans were forcibly removed from their homelands in western and central Africa and subjected to the unspeakable horrors of the Middle Passage aboard slave ships, slave auctions, and ultimately the hardships of plantation labor in the American South (as well as in the Caribbean and South America). Because they were thrown together from diverse backgrounds, and because their own languages and customs were suppressed by slavery, we do not know how much of slave family life was a reflection of African traditions and how much was an adaptation to their conditions and treatment in America (Taylor 2000).

But there is no doubt that family life was one of the victims of the slave system. The histories that have come down to us feature heart-wrenching stories of family separation, including diaries that tell of children literally ripped from their mothers’ arms by slave traders, mothers taking poison to prevent themselves from being sold, and parents enduring barbaric whippings as punishment for trying to keep their families together (Lerner 1973). In fact, most slaves only had a given name with no family name, which made the formation and recognition of family lineages difficult or impossible (Frazier 1930). Slave marriage and parenthood were not legally recognized by the states, and separation was a constant threat. Any joy in having children was tempered by the recognition that those children were the property of the slave owner and could be sold or transferred away forever.

Nevertheless, most slaves lived in families for some or all of their lives. Most married (if not legally) and had children in young adulthood, and most children lived with both parents. This was especially the case on larger plantations rather than small farms, because slaves could carve out some protection for community life if they were in larger groups, and husbands and wives were more likely to remain together (Coles 2006). Even if they had families, however, African Americans for the most part were excluded from the emerging modern family practices described in the next section until after slavery ended.

Relevant references:

Coles, Roberta L. Race and Family: A Structural Approach. 2006. Thousand Oaks, CA: Sage.

Frazier, E. Franklin. 1930. “The Negro Slave Family.” The Journal of Negro History 15(2):198–259.

Lerner, Gerda. 1973. Black Women in White America: A Documentary History. New York: Vintage Books.

Taylor, Ronald L. 2000. “Diversity within African American Families.” In Handbook of Family Diversity, edited by David H. Demo, Katherine R. Allen, and Mark A. Fine, pp. 232–251. New York: Oxford University Press.

And in our teaching materials, we address it this way, with a multiple choice question:

Most African American slave children lived with: A. grandparents. B. unrelated adults.  C. one parent. D. both parents [D is correct].

And an essay question:

Describe the impact of slavery on the family structure of African Americans throughout U.S. history.

Answer guide: Students should address the lost customs and languages of diverse Africans brought as slaves. Social scientists are often unsure which of the resulting cultural features of African American family life are held over from African traditions and which are adaptations to slavery. Family lineage was difficult or impossible to trace. Separation of parents and children was common. After the Civil War, African American families were legally recognized, and some were reunited. Emerging African American families were more egalitarian in gender roles and had strong extended family and kinship networks.

This story has good lessons about a number of things that scare people who teach family sociology (and lots of other people, too): being wrong, being called racist, and getting fired for saying something on Facebook. Good chance to reflect on teaching, which is hard, but also great.

Quick correction on that 90-percent-of-faculty-are-White thing

The other day I saw a number of anti-racist people tweeting that “nearly 90% of full-time professors are White.” As I have previously complained when 90% of the full professors at my then-school (UNC) were White, I was interested to follow up. Unfortunately, that popular tweet turns out to be a stretched description of a simple error.

The facts are in this Education Department report from May, which was reported at the time by The Ed Advocate, and suddenly started going around the other day for unknown reasons. The “nearly 90%” is the Ed Advocate’s description of 84%, which is the percentage White among full-time full professors, which the original report in one place accidentally describes as just full-time professors. Among all full-time instructional faculty, in fact, 79% are White. So the headline, “Study: Nearly 90 Percent of Full-time Professors Are White,” was a conflation of two errors. It presumably became popular because it put a number to a real problem lots of people are aware of and looking for ways to highlight.

Here is the original chart:


The problem of White over-representation among college faculty is not that apparent in this national 79% statistic. Consider, for example, that among all full-time, full-year workers age 40 and older (my made-up benchmark), 71% are non-Hispanic White. Among those with a Masters degree or higher, 77% are White. So faculty, nationally and at all levels, don’t look that different from the pool from which they’re drawn.

The 84% full professor statistic reflects the greater White representation as you move up the academic hierarchy. And that’s not just a question of waiting for younger cohorts with more non-White faculty to age into the professoriate. Because the pipeline isn’t working that well, especially for Black faculty. Which brings me back to my old UNC complaint, which focused mostly on Back under-representation. In 2010 I noted that the North Carolina population was 22% Black, while the UNC faculty was 4.7% Black. But full professors at UNC were just 2.4% Black, while the assistant professors were 7.5% Black. Is that the pipeline working? Well, only 4.5% of the recent faculty hires were Black.

I went back to check on things. As of the 2014 report (they’re all here), the update is that UNC has stopped reporting the numbers by rank, so now all they say is that 5.2% of all faculty are Black, and they don’t report the makeup of recent hires. So take from that what you will.

And what about further up the pipeline? I previously shared numbers showing a drop in Black representation among entering freshmen at the University of Michigan, from 10% to 5% over the 2000s. The trend at UNC is in the same direction:

unc black studentsOf course we always need to be cautious about numbers that support what we already know or believe. Some people will respond to this by saying, “but the point remains.” Right, but if the number is irrelevant to the point, there’s no need to use the number. Plenty of people can say, “In all my undergraduate years, I never had a Black professor,” or some other highly relevant observation.*

On the other hand, others of us need to disabuse ourselves of the notion that progress on under-representation is just happening out there because everyone thinks it should and it’s just a matter of time. That common assumption allows defensive administrators to do write thinks like this caption (from UNC’s 2011-2012 report):


This is misleading: There was a big increase in Hispanic students (North Carolina has a growing Hispanic population) and Asian students, and marked drops in Black and American Indian students. But “overall, steady increase” is an easy narrative to sell.

If they scaled that chart from 0 to 12 and dropped Whites, “overall, steady increase” would look like this:


* I think I had three great Black professors at Michigan: Walter Allen, Robin D. G. Kelley, and Cecilia Green, each of whom changed my life forever. Sorry if I’m forgetting someone.

Related posts:

How to illustrate a .61 relationship with a .93 figure: Chetty and Wilcox edition

Yesterday I wondered about the treatment of race in the blockbuster Chetty et al. paper on economic mobility trends and variation. Today, graphics and representation.

If you read Brad Wilcox’s triumphalist Slate post, “Family Matters” (as if he needed “an important new Harvard study” to write that), you saw this figure:


David Leonhardt tweeted that figure as “A reminder, via [Wilcox], of how important marriage is for social mobility.” But what does the figure show? Neither said anything more than what is printed on the figure. Of course, the figure is not the analysis. But it is what a lot of people remember about the analysis.

But the analysis on which it is based uses 741 commuting zones (metropolitan or rural areas defined by commuting patterns). So what are those 20 dots lying so perfectly along that line? In fact, that correlation printed on the graph, -.764, is much weaker than what you see plotted on the graph. The relationship you’re looking at is -.93! (thanks Bill Bielby for pointing that out).

In the paper, which presumably few of the people tweeting about it read, the authors explain that these figures are “binned scatter plots.” They broke the commuting zones into equally-sized groups and plotted the means of the x and y variables. They say they did percentiles, which would be 100 dots, but this one only has 20 dots, so let’s call them vigintiles.

In the process of analysis, this might be a reasonable way to eyeball a relationship and look for nonlinearities. But for presentation it’s wrong wrong wrong.* The dots compress the variation, and the line compresses it more. The dots give the misleading impression that you’re displaying the variance around the line. What, are you trying save ink?

Since the data are available, we can look at this for realz. Here is the relationship with all the points, showing a much messier relationship, the actual -.76 (the range of the Chetty et al. figure, which was compressed by the binning, is shown by the blue box):

chetty scattersThat’s 709 dots — one for each of the commuting zones for which they had sufficient data. With today’s powerful computers and high resolution screens, there is no excuse for reducing this down to 20 dots for display purposes.

But wait, there’s more. What about population differences? In the 2000 Census, these 709 commuting zones ranged in population in the 2000 Census from 5,000 (Southwest Jackson, Utah) to 16,000,000 (Los Angeles). Do you want to count Southwest Jackson as much as Los Angeles in your analysis of the relationship between these variables? Chetty et al. do in their figure. But if you weight them by population size, so each person in the population contributes equally to the relationship, that correlation that was -.76 — which they displayed as -.93 — is reduced to -.61. Yikes.

Here is what the plot looks like if you scale the commuting zones according to population size (more or less, not quite sure how Stata does this):

chetty scatters weighted

Now it’s messier, and the slope is much less steep. And you can see that gargantuan outlier — which turns out to be the New York commuting zone, which has 12 million people and with a lot more upward mobility than you would expect based on its family structure composition.

Finally, while we’re at it, we may as well attend to that nonlinearity that has been apparent since the opening figure. We can increase the variance explained from .38 to .42 by adding a quadratic term, to get this:

chetty scatters weighted quad

I hate to go beyond what the data can really tell. But — what the heck — it does appear that after 33% single-mother families, the effect hits its minimum and turns positive. These single mother figures are pretty old (when Chetty et al.’s sample were kids). Now that the country has surpassed 40% unmarried births, I think it’s safe to say we’re out of the woods. But that’s just speculation.**

*OK, OK: “wrong wrong wrong” is going too far. Absolute rules in data visualization are often wrong wrong wrong. Binning 709 groups down to 20 is extreme. Sometimes you have a zillion points. Sometimes the plot obscures the pattern. Sometimes binning is an inherent part of measurement (we usually measure age in years, for example, not seconds). None of that is an excuse in this case. However, Carter Butts sent along an example that makes the point well:


On the other hand, the Chetty et al. case is more similar to the following extreme example:

If you were interested in the relationship between age and earnings for a sample of 1,400 full-time, year-round women, you might start with this, which is a little frustrating:


The linear relationship is hard to see, but it’s about +$500 per year of age. However, the correlation is only .13, and the variance explained by linear-age alone is only 1.7%. But if you plotted the mean wage over ages, the correlation jumps to .68:


That’s a different question. It’s not, “how does age affect earnings,” it’s, “how does age affect mean earnings.” And if you binned the women into 10-year age intervals (25-34, 35-44, 45-54), and plotted the mean wage for each group, the correlation is .86.


Chetty et al. didn’t report the final correlation, but they showed it, even adding the regression line, so that Wilcox could call it the “bivariate relationship.”

**This paragraph was a joke that several people missed, so I’m clarifying. I would never draw a conclusion like that from the scraggly tale of a loose correlation like this.

Where is race in the Chetty et al. mobility paper?

What does race have to do with mobility? The words “race,” “black,” or “African American” don’t appear in David Leonhardt’s report on the new Chetty et al. paper on intergenerational mobility that hit the news yesterday. Or in Jim Tankersley’s report in the Washington Post, which is amazing, because it included this figure: post-race-mobility That’s not exactly a map of Black America, which the Census Bureau has produced, but it’s not that far off: census-black-2010

But even if you don’t look at the map, what if you read the paper? Describing the series of maps of intergenerational mobility, the authors write:

Perhaps the most obvious pattern from the maps in Figure VI is that intergenerational mobility is lower in areas with larger African-American populations, such as the Southeast. … Figure IXa confirms that areas with larger African-American populations do in fact have substantially lower rates of upward mobility. The correlation between upward mobility and fraction black is -0.585. In areas that have small black populations, children born to parents at the 25th percentile can expect to reach the median of the national income distribution on average (y25;c = 50); in areas with
large African-American populations, y25;c is only 35.

Here is that Figure IXa, which plots Black population composition and mobility levels for groups of commuting zones: ixa Yes, race is an important part of the story. In a nice part of the paper, the authors test whether Black population size is related to upward mobility for Whites (or, people in zip codes that are probably White, since race isn’t in their tax records), and find that it is. It’s not just Blacks driving the effect. I’m thinking about the historical patterns of industrial development, land ownership, the backwardness of racist elites in the South, and so on. But they’re not. For some reason, not explained at all, Chetty et al. offer this pivot:

The main lesson of the analysis in this section is that both blacks and whites living in areas with large African-American populations have lower rates of upward income mobility. One potential mechanism for this pattern is the historical legacy of greater segregation in areas with more blacks. Such segregation could potentially affect both low-income whites and blacks, as racial segregation is often associated with income segregation. We turn to the relationship between segregation and upward mobility in the next section.

And that’s it, they don’t discuss Black population size again, instead only focusing on racial segregation. They don’t pursue this “potential mechanism” in the analysis that follows. Instead, they drop percent Black for racial segregation. I have no idea why, especially considering this Table VII, which shows unadjusted (and normalized) correlations (more or less) between each variable and absolute upward mobility (the variable mapped above): tablevii

In these normalized correlations, fraction Black has a stronger relationship to mobility than racial segregation or economic segregation! In fact, it’s just about the strongest relationship on the whole long table (except for single mothers, with which it is of course highly correlated). So why do they not use it in their main models? Maybe someone else can explain this to me. (Full disclosure, my whole dissertation was about this variable.)

This is especially unfortunate because they do an analysis of the association between commuting zone family structure (using macro-level variables) and individual-level mobility, controlling for marital status — but not race — at the individual level. From this they conclude, “Children of married parents also have higher rates of upward mobility if they live in communities with fewer single parents.” I am quite suspicious that this effect is inflated by the omission of race at either level. So they write the following, which goes way beyond what they can find in the data:

Hence, family structure correlates with upward mobility not just at the individual level but also at the community level, perhaps because the stability of the social environment affects children’s outcomes more broadly.

Or maybe, race.

I explored the percent Black versus single mother question in a post a few weeks ago using the Chetty et al. data. I did two very simple OLS regression models using only the 100 largest commuting zones, weighted for population size, the first with just single motherhood, and then a model with proportion Black added: This shows that the association between single motherhood rates and immobility is reduced by two-thirds, and is no longer significant at conventional levels, when percent Black is added to the model. That is: Percent Black statistically explains the relationship between single motherhood and intergenerational immobility across U.S. labor markets. That’s not an analysis, it’s just an argument for keeping percent Black in the more complex models. Substantively, the level of racial segregation is just one part of the complex race story — it measures one kind of inequality in a local area, but not the amount of Black, which matters a lot (I won’t go into it all, but here are three old papers: one, two, three.

The burgeoning elite conversation about economic mobility, poverty, and inequality is good news. It’s avoidance of race is not.