Tag Archives: inequality

How well do teen test scores predict adult income?

Now with new figures and notes added at the end!

The short answer is, pretty well. But that’s not really the point.

In a previous post I complained about various ways of collapsing data before plotting it. Although this is useful at times, and inevitable to varying degrees, the main danger is the risk of inflating how strong an effect seems. So that’s the point about teen test scores and adult income.

If someone told you that the test scores people get in their late teens were highly correlated with their incomes later in life, you probably wouldn’t be surprised. If I said the correlation was .35, on a scale of 0 to 1, that would seem like a strong relationship. And it is. That’s what I got using the National Longitudinal Survey of Youth. I compared the Armed Forces Qualifying Test scores, taken in 1999, when the respondents were ages 15-19 with their household income in 2011, when they were 27-31.*

Here is the linear fit between between these two measures, with the 95% confidence interval shaded, showing just how confident we can be in this incredibly strong relationship:


That’s definitely enough for a screaming headline, “How your kids’ test scores tell you whether they will be rich or poor.” And it is a very strong relationship – that correlation of .35 means AFQT explains 12% of the variation in household income.

But take heart, ye parents in the age of uncertainty: 12% of the variation leaves a lot left over. This variable can’t account for how creative your children are, how sociable, how attractive, how driven, how entitled, how connected, or how White they may be. To get a sense of all the other things that matter, here is the same data, with the same regression line, but now with all 5,248 individual points plotted as well (which means we have to rescale the y-axis):


Each dot is a person’s life — or two aspects of it, anyway — with the virtually infinite sources of variability that make up the wonder of social existence. All of a sudden that strong relationship doesn’t feel like something you can bank on with any given individual. Yes, there are very few people from the bottom of the test-score distribution who are now in the richest households (those clipped by the survey’s topcode and pegged at 3 on my scale), and hardly anyone from the top of the test-score distribution who is now completely broke.

But I would guess that for most kids a better predictor of future income would be spending an hour interviewing their parents and high school teachers, or spending a day getting to know them as a teenager. But that’s just a guess (and that’s an inefficient way to capture large-scale patterns).

I’m not here to argue about how much various measures matter for future income, or whether there is such a thing as general intelligence, or how heritable it is (my opinion is that a test such as this, at this age, measures what people have learned much more than a disposition toward learning inherent at birth). I just want to give a visual example of how even a very strong relationship in social science usually represents a very messy reality.

Post-publication addendums

1. Prediction intervals

I probably first wrote about this difference between the slope and the variation around the slope two years ago, in a futile argument against the use of second-person headlines such as “Homophobic? Maybe You’re Gay.” Those headlines always try to turn research into personal advice, and are almost always wrong.

Carter Butts, in personal correspondence, offered an explanation that helps make this clear. The “you” type headline presents a situation in which you – the reader — are offered the chance to add yourself to the study. In that case, your outcome (the “new response” in his note) is determined by the both the line and the variation around the line. Carter writes:

the prediction interval for a new response has to take into account not only the (predicted) expectation, but also the (predicted) variation around that expectation. A typical example is attached; I generated simulated data (N=1000) via the indicated formula, and then just regressed y on x. As you’d expect, the confidence bands (red) are quite narrow, but the prediction bands (green) are large – in the true model, they would have a total width of approximately 1, and the estimated model is quite close to that. Your post nicely illustrated that the precision with which we can estimate a mean effect is not equivalent to the variation accounted for by that mean effect; a complementary observation is that the precision with which we can estimate a mean effect is not equivalent to the accuracy with which we can predict a new observation. Nothing deep about that … just the practical points that (1) when people are looking at an interval, they need to be wary of whether it is a confidence interval or a prediction interval; and (2) prediction interval can (and often should be) wide, even if the model is “good” in the sense of being well-estimated.

And here is his figure. “You” are very likely to be between the green lines, but not so likely to be between the red ones.


2. Random other variables

I didn’t get into the substantive issues, which are outside my expertise. However, one suggestion I got was interesting: What about happiness? Without endorsing the concept of “life satisfaction” as measured by a single question, I still think this is a nice addition because it underscores the point of wide variation in how this relationship between test scores and income might be experienced.

So here is the same figure, but with the individuals coded according to how they answered the following question in 2008, when they were age 24-28, “All things considered, how satisfied are you with your life as a whole these days? Please give me an answer from 1 to 10, where 1 means extremely dissatisfied and 10 means extremely satisfied.” In the figure, Blue is least satisfied (1-6; 21%), Orange is moderately satisfied (7-8; 46%), and Green is most satisfied (9-10; 32%)


Even if you squint you probably can’t discern the pattern. Life satisfaction is positively correlated with income at .16, and less so with test scores (.07). Again, significant correlation — not helpful for planning your life.

* I actually used something similar to AFQT: the variable ASVAB, which combines tests of mathematical knowledge, arithmetic reasoning, word knowledge, and paragraph comprehension, and scales them from 0 to 100. For household income, I used a measure of household income relative to the poverty line (adjusted for household size), plus one, and transformed by natural log. I used household income because some good test-takers might marry someone with a high income, or have fewer people in their households — good decisions if your goal is maximizing household income per person.


Filed under Me @ work

How to illustrate a .61 relationship with a .93 figure: Chetty and Wilcox edition

Yesterday I wondered about the treatment of race in the blockbuster Chetty et al. paper on economic mobility trends and variation. Today, graphics and representation.

If you read Brad Wilcox’s triumphalist Slate post, “Family Matters” (as if he needed “an important new Harvard study” to write that), you saw this figure:


David Leonhardt tweeted that figure as “A reminder, via [Wilcox], of how important marriage is for social mobility.” But what does the figure show? Neither said anything more than what is printed on the figure. Of course, the figure is not the analysis. But it is what a lot of people remember about the analysis.

But the analysis on which it is based uses 741 commuting zones (metropolitan or rural areas defined by commuting patterns). So what are those 20 dots lying so perfectly along that line? In fact, that correlation printed on the graph, -.764, is much weaker than what you see plotted on the graph. The relationship you’re looking at is -.93! (thanks Bill Bielby for pointing that out).

In the paper, which presumably few of the people tweeting about it read, the authors explain that these figures are “binned scatter plots.” They broke the commuting zones into equally-sized groups and plotted the means of the x and y variables. They say they did percentiles, which would be 100 dots, but this one only has 20 dots, so let’s call them vigintiles.

In the process of analysis, this might be a reasonable way to eyeball a relationship and look for nonlinearities. But for presentation it’s wrong wrong wrong.* The dots compress the variation, and the line compresses it more. The dots give the misleading impression that you’re displaying the variance around the line. What, are you trying save ink?

Since the data are available, we can look at this for realz. Here is the relationship with all the points, showing a much messier relationship, the actual -.76 (the range of the Chetty et al. figure, which was compressed by the binning, is shown by the blue box):

chetty scattersThat’s 709 dots — one for each of the commuting zones for which they had sufficient data. With today’s powerful computers and high resolution screens, there is no excuse for reducing this down to 20 dots for display purposes.

But wait, there’s more. What about population differences? In the 2000 Census, these 709 commuting zones ranged in population in the 2000 Census from 5,000 (Southwest Jackson, Utah) to 16,000,000 (Los Angeles). Do you want to count Southwest Jackson as much as Los Angeles in your analysis of the relationship between these variables? Chetty et al. do in their figure. But if you weight them by population size, so each person in the population contributes equally to the relationship, that correlation that was -.76 — which they displayed as -.93 — is reduced to -.61. Yikes.

Here is what the plot looks like if you scale the commuting zones according to population size (more or less, not quite sure how Stata does this):

chetty scatters weighted

Now it’s messier, and the slope is much less steep. And you can see that gargantuan outlier — which turns out to be the New York commuting zone, which has 12 million people and with a lot more upward mobility than you would expect based on its family structure composition.

Finally, while we’re at it, we may as well attend to that nonlinearity that has been apparent since the opening figure. We can increase the variance explained from .38 to .42 by adding a quadratic term, to get this:

chetty scatters weighted quad

I hate to go beyond what the data can really tell. But — what the heck — it does appear that after 33% single-mother families, the effect hits its minimum and turns positive. These single mother figures are pretty old (when Chetty et al.’s sample were kids). Now that the country has surpassed 40% unmarried births, I think it’s safe to say we’re out of the woods. But that’s just speculation.**

*OK, OK: “wrong wrong wrong” is going too far. Absolute rules in data visualization are often wrong wrong wrong. Binning 709 groups down to 20 is extreme. Sometimes you have a zillion points. Sometimes the plot obscures the pattern. Sometimes binning is an inherent part of measurement (we usually measure age in years, for example, not seconds). None of that is an excuse in this case. However, Carter Butts sent along an example that makes the point well:


On the other hand, the Chetty et al. case is more similar to the following extreme example:

If you were interested in the relationship between age and earnings for a sample of 1,400 full-time, year-round women, you might start with this, which is a little frustrating:


The linear relationship is hard to see, but it’s about +$500 per year of age. However, the correlation is only .13, and the variance explained by linear-age alone is only 1.7%. But if you plotted the mean wage over ages, the correlation jumps to .68:


That’s a different question. It’s not, “how does age affect earnings,” it’s, “how does age affect mean earnings.” And if you binned the women into 10-year age intervals (25-34, 35-44, 45-54), and plotted the mean wage for each group, the correlation is .86.


Chetty et al. didn’t report the final correlation, but they showed it, even adding the regression line, so that Wilcox could call it the “bivariate relationship.”

**This paragraph was a joke that several people missed, so I’m clarifying. I would never draw a conclusion like that from the scraggly tale of a loose correlation like this.


Filed under Research reports

Where is race in the Chetty et al. mobility paper?

What does race have to do with mobility? The words “race,” “black,” or “African American” don’t appear in David Leonhardt’s report on the new Chetty et al. paper on intergenerational mobility that hit the news yesterday. Or in Jim Tankersley’s report in the Washington Post, which is amazing, because it included this figure: post-race-mobility That’s not exactly a map of Black America, which the Census Bureau has produced, but it’s not that far off: census-black-2010

But even if you don’t look at the map, what if you read the paper? Describing the series of maps of intergenerational mobility, the authors write:

Perhaps the most obvious pattern from the maps in Figure VI is that intergenerational mobility is lower in areas with larger African-American populations, such as the Southeast. … Figure IXa confirms that areas with larger African-American populations do in fact have substantially lower rates of upward mobility. The correlation between upward mobility and fraction black is -0.585. In areas that have small black populations, children born to parents at the 25th percentile can expect to reach the median of the national income distribution on average (y25;c = 50); in areas with
large African-American populations, y25;c is only 35.

Here is that Figure IXa, which plots Black population composition and mobility levels for groups of commuting zones: ixa Yes, race is an important part of the story. In a nice part of the paper, the authors test whether Black population size is related to upward mobility for Whites (or, people in zip codes that are probably White, since race isn’t in their tax records), and find that it is. It’s not just Blacks driving the effect. I’m thinking about the historical patterns of industrial development, land ownership, the backwardness of racist elites in the South, and so on. But they’re not. For some reason, not explained at all, Chetty et al. offer this pivot:

The main lesson of the analysis in this section is that both blacks and whites living in areas with large African-American populations have lower rates of upward income mobility. One potential mechanism for this pattern is the historical legacy of greater segregation in areas with more blacks. Such segregation could potentially affect both low-income whites and blacks, as racial segregation is often associated with income segregation. We turn to the relationship between segregation and upward mobility in the next section.

And that’s it, they don’t discuss Black population size again, instead only focusing on racial segregation. They don’t pursue this “potential mechanism” in the analysis that follows. Instead, they drop percent Black for racial segregation. I have no idea why, especially considering this Table VII, which shows unadjusted (and normalized) correlations (more or less) between each variable and absolute upward mobility (the variable mapped above): tablevii

In these normalized correlations, fraction Black has a stronger relationship to mobility than racial segregation or economic segregation! In fact, it’s just about the strongest relationship on the whole long table (except for single mothers, with which it is of course highly correlated). So why do they not use it in their main models? Maybe someone else can explain this to me. (Full disclosure, my whole dissertation was about this variable.)

This is especially unfortunate because they do an analysis of the association between commuting zone family structure (using macro-level variables) and individual-level mobility, controlling for marital status — but not race — at the individual level. From this they conclude, “Children of married parents also have higher rates of upward mobility if they live in communities with fewer single parents.” I am quite suspicious that this effect is inflated by the omission of race at either level. So they write the following, which goes way beyond what they can find in the data:

Hence, family structure correlates with upward mobility not just at the individual level but also at the community level, perhaps because the stability of the social environment affects children’s outcomes more broadly.

Or maybe, race.

I explored the percent Black versus single mother question in a post a few weeks ago using the Chetty et al. data. I did two very simple OLS regression models using only the 100 largest commuting zones, weighted for population size, the first with just single motherhood, and then a model with proportion Black added: This shows that the association between single motherhood rates and immobility is reduced by two-thirds, and is no longer significant at conventional levels, when percent Black is added to the model. That is: Percent Black statistically explains the relationship between single motherhood and intergenerational immobility across U.S. labor markets. That’s not an analysis, it’s just an argument for keeping percent Black in the more complex models. Substantively, the level of racial segregation is just one part of the complex race story – it measures one kind of inequality in a local area, but not the amount of Black, which matters a lot (I won’t go into it all, but here are three old papers: one, two, three.

The burgeoning elite conversation about economic mobility, poverty, and inequality is good news. It’s avoidance of race is not.


Filed under Research reports

Mystery solved? Why “women in their 20s” earn more

When pundits like David Brooks get sucked into the factoid-warp of Hanna Rosin (The End of Men) and Liza Mundy (The Richer Sex), they are always floored by the idea that young women earn more than young men. To them this represents the future. And woe to any woman trying to convince a jury she’s being discriminated against while these books are in the headlines. Brooks spelled it out real simple: “Women in their 20s outearn men in their 20s.”

That’s easily shown to be wrong (still holding my breath for the correction). But the more detailed factoid, the one you get in the long-soundbite version of the end-of-history, is that “median full-time wages for single childless women ages 22-30 exceeds those of single childless men in the same age group,” as reported in USA Today, for example. That was calculated by Reach Advisors using the American Community Survey.*

Making broad conclusions based on weird data slices is bad practice. And this is a great case study in why.

Who are those full-time working, not-married and childfree 20-somethings in metro areas? I ran that filter over the 2010 ACS data available from IPUMS, and this jumped out:

OK, so for whatever reason, notice that this group includes a disproportionate share of White women and Latino men. That turns out to be pivotal, since these particular Latino men have very low earnings. Check the earnings by race/ethnicity and gender:

So that’s it. The overall $1,000 advantage for women (seen in the bars on the far right) is the result of these particular Latino men’s low earnings. The high earnings of these White women are important, of course, they’re just not higher than White men’s. If you just look at Whites or Blacks there is no advantage for women.

I am all for getting into the problem of Latino men’s (and women’s) low average earnings. But that’s not where this story has been going. More than anything this is just shoddy statistical cherry-picking.

Hey media mega-conglomerates: give that meme a rest!

* Reach Advisors also limited the analysis to metro areas, so I did that as well. I don’t get as big an advantage for “women” as that reported in that 2010 USA Today article, which said it was based on 2008 data (they got an 8% gap, I get 3%). I don’t care to figure out exactly the source of the differences (and Reach hasn’t published their code).


Filed under In the news, Me @ work

Quick book review: The Price of Inequality

The Price of Inequality: How Today’s Divided Society Endangers Our Future, by Joseph E. Stiglitz (W. W. Norton, 2012)

My economics training as a sociologist — with a background in American Culture studies — has been spotty and roundabout. I got a healthy dose of Marxist economics in college, and then some feminist economics, a little human capital theory and some dated econometrics in grad school and since.

All that made reading made it interesting, and also frustrating, to read The Price of Inequality, by Joseph Stiglitz – a winner of the Nobel Prize for economics and an “insanely great economist,” according to Paul Krugman.

On the plus side, I am glad to see someone within mainstream economic theory freely discussing all the ways that common assumptions simply do not predominate in the modern economic scene. Especially helpful in this category is his discussion of how “rents” accumulate vast resources at the upper end of the income distribution, with perverse effects on economic development and politics alike. At the very top — in the finance sector especially, but also in energy and big manufacturing — there is nothing like free-market competition. And the beneficiaries of those distortions are the most powerful players in the economy and political system.

It is refreshing to see this concentration of wealth described as waste and distortion, as their vast profits provide little gain to anyone else. In fact, dumping vast wealth on the 1% creates a drag on the macroeconomy while fueling the historic run-up in economic inequality. This is all very timely and takes you right through the financial crisis up to early 2012.

So if you want to understand from an economic perspective how “the market” in America isn’t the way it’s supposed to be, this book may be for you.

Top 1% income shares, including capital gains, for the U.S. and Sweden. From the World Top Incomes Database.

The other good thing about the book for many readers will be its cogent and comprehensive economic rationale for the liberal reforms that many of you probably supported already. Stiglitz makes the case that a suite of reforms – an agenda Rachel Maddow, Elizabeth Warren and Robert Reich probably agree on – would, by (directly or indirectly) increasing taxes (or reducing subsidies) on the wealthy and redistributing wealth downward, reduce the federal debt, increase economic growth, and reduce economic inequality all at the same time.

Round numbers: if the richest 1% earn about 20% of all income, then taxing them another 10% would generate government revenue equivalent to 2% of GDP. (And it wouldn’t hurt anything, since they just hoard or waste their extra cash anyway rather than “creating jobs” with it, and they’re so greedy they wouldn’t be discouraged by the disincentive effect of higher taxes.) That’s an amount of money that could actually be useful for poor people.

The frustration I feel reading the book is more amorphous. I think there have to be better ways of describing this whole system than using the language of mainstream economics, which ends up painting a picture of an entire system that does not work according to the rules as imagined. Concepts like power, social class, social networks, elites and reification do not figure heavily in this story. In fact, Stiglitz’s apparent ignorance of sociology is sometimes funny as in this passage:

Social sciences like economics differ from the hard sciences in that beliefs affect reality: beliefs about how atoms behave don’t affect how Adams actually behave, but beliefs about how the economic system functions affect how it actually functions. George Soros, the great financier, has referred to this phenomenon has “reflexivity,” and his understanding of it may have contributed to his success.

I guess after what people like me have made of econometrics it’s only fair that economists would attribute the idea of reflexivity to Soros. (The discussion of reflexivity in Anthony Giddens’s book The Consequences of Modernity is very approachable.)

Anyway, the book is easy to read and informative, and has lots of footnotes and references.


Filed under Research reports

Do Asians in the U.S. have high incomes?

The Pew Research Center last week released a lengthy research report on Asians in the U.S., titled “The Rise of Asian Americans.” It combines information from the Census and government sources with the results of Pew’s own national survey of attitudes and opinions.

The report has lots of good information, but there are some thorny problems here. I’ll describe a few problems, then offer one data exercise to help clarify. This gets technical and it’s long, so I will give you the substantive conclusion at the top:

  1. Because Asians are a diverse category made up of groups with very different profiles, and their household composition and geographic distribution vary by national origin group, generalizations are often unhelpful.
  2. Among the 10 largest Asian groups, five (Japanese, Indian, Chinese, Filipino, Korean) are above average in income and five (Vietnamese, Pakistani, Laotian, Cambodian, Hmong) are below. But all 10 Asian groups are doing better compared to the national average than they are compared to the average incomes in the places they live — they are richer nationally than they are locally.
  3. The amount of income inequality within Asian groups varies as well. Pakistanis,  Chinese, Koreans and Indians have the highest levels of inequality, while Filipinos and Laotians have low levels of inequality.

Details follow.

But first: Who is Asian? On the Census questionnaire, Asian is not exactly a category – rather, the category is created from all the responses of people who specify Asian national origins in the race question. To refresh, this is the question:

So “Asian” is all the people who specify Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese or “Other Asian.” (The right-hand column is for Pacific Islanders.) Yes, in the U.S., Hispanic/Latino national origins are “ethnicities,” but Asian national origins are “races.” Go figure.

That lack of a common definition is compounded by two factors: First, there is so much diversity among Asians that the using a single category is as challenging statistically as it is politically. And second, Asians – as the Pew report shows – have a high rate of intermarriage with Whites, as well as (among some groups) across Asian national-origin lines. As a result, some Asian groups have high rates of “multiple-race identification” — especially those whose immigration was generations ago.

The controversy over the Pew report is summarized in this Color Lines story and this response from the Asian American / Pacific Islander Policy Research Consortium. The gist of it is that the report was too rosy in its description of Asian advantages and too homogenizing in its treatment of Asian diversity – as a result repeating the “divisive trope” of the “model minority.” Here’s part of the summary from the New York Times:

Drawing on Census Bureau and other government data as well as telephone surveys from Jan. 3 to March 27 of more than 3,500 people of Asian descent, the 214-page study found that Asians are the highest-earning and best-educated racial group in the country.

Among Asians 25 or older, 49 percent hold a college degree, compared with 28 percent of all people in that age range in the United States. Median annual household income among Asians is $66,000 versus $49,800 among the general population.

In the survey, Asians are also distinguished by their emphasis on traditional family mores. About 54 percent of the respondents, compared with 34 percent of all adults in the country, said having a successful marriage was one of the most important goals in life; another was being a good parent, according to 67 percent of Asian adults, compared with about half of all adults in the general population.

Asians also place greater importance on career and material success, the study reported, values reflected in child-rearing styles. About 62 percent of Asians in the United States believe that most American parents do not put enough pressure on their children to do well in school.

Did Pew homogenize or glorify too much? I don’t know. Here’s a graph from the report, which shows that Asian groups differ, but they all have higher-than-average household incomes:

The Color Lines story quotes Deepa Iyer, head of the National Council of Asian Pacific Americans and executive director of South Asian Americans Leading Together:

The danger in framing the study the way Pew did, and the way the media picked up on it, is that folks who are in the general public and institutional stakeholders and policy makers might get the impression that they don’t necessarily need to dig deep into our communities to understand any sort of disparities that exist.

The problem of homogenizing Asians is longstanding in American sociology. In most data analyses, the Asian sample is small to begin with, so they are often collapsed into one category (which I’ve done) or dropped from the story (which I’ve also done, angering some readers). Here is a typical passage, from a 2001 article by Leslie McCall:

That didn’t stop her (or lots of other people) from extensively analyzing Asians as a combined group, and offering speculation on her results.

There are other examples. In my experience, Jen’nan Read and I broke out six Asian groups for a study of women’s employment with the 2000 Decennial Census data — which reinforced my conviction that disaggregating is best. (This 2010 Census report gives some detail on more than 20 national-origin groups.)

Some new numbers

Anyway, I’ve got four specific issues to address with Pew’s comparison of household incomes (some of which they acknowledge in the report): a) Household composition differs between groups (more or fewer kids, grandparents); b) Asians disproportionately live in parts of the U.S. with high costs of living (like Hawaii and California, and urban areas generally); c) different members of a household might have different “race” identities (so, a Korean man married to a Chinese woman might define their child is either or both); and d), levels of inequality differ between groups, so central tendency comparisons don’t capture the whole story.

In this exercise I address these problems. I adjust for household size and composition, count individuals’ own “race” rather than imposing a single identity on the household, compare incomes to the average in the local metropolitan area as well as the national average, and compare levels of within-group inequality.

All in one blog post! Someone might want to work this up into a real paper (and maybe someone else already has? The last time I really read about this was more than 10 years ago.) So I’m just offering this approach as a suggestion, and making my code available if anyone wants to pursue it (see below).

I use the 2006-2010 combined American Community Survey, from IPUMS, for maximum recent sample size. This is about 15 million people, and the Asian samples range from about 160,000 Chinese to 7,500 Laotians. I identify individuals according to their individual “race.”

I calculate their incomes as per capita household income, adjusted for economies of scale. To do that, I count adults as 1 person, kids under 18 as .7 of a person, and divide the total household income by that count to the power of .65 for economies of scale (see here for details). Then I take the natural log of all that to pull in the right tail of the distribution (so the mean isn’t pulled up by the ~1%). When I’m done, everyone in the household has the same income, and the distribution is pretty normal. Nice!

To see what this does: The mean household income for individuals in the country in 2006-2010 is $79,174, and the natural log of the composition-and-scale adjusted per capita income is 10.26 (see figure), which works out to $28,439. In comparison, the logged incomes for Asians range from 10.6 (~$40,000) for Indians and Japanese, down to 9.7 (~$16,000) for Hmong.

To deal with the issue of living in expensive areas, I take the mean of that logged income in each metropolitan area, and compare each person’s own per capita income to that. So a score of 0 means you have the average income in your area — more than 0 means richer than average, less than zero is poorer.

There is not one correct answer about how to do this: Having an average income in a rich area still means you can buy more stuff on Amazon than someone with a lower absolute income. But it might also mean having a smaller house, or not being considered rich by your neighbors. On the third hand, if a rich family moves to a rich area, we shouldn’t feel sorry for them for not being above average in their neighborhood. For your consideration, I show the incomes compared with the national average and with the local metro mean, for the 10 largest Asian groups (click for higher resolution):

To interpret the figure, you can see that Japanese and Indians are about 0.36 higher in log dollars than the national average but only 0.26 higher than their metro-area averages. On the downside, Hmong individuals have adjusted per capita incomes of 0.58 less than the national average, but 0.63 less than their local average.

Higher-than-average-income Japanese, Indians, Filipinos and Chinese are about 73% of the total; Koreans are about average, and the lower-than-average groups are 17% of the total. By this method, then, a big majority of Asians in the U.S. belong to above-local-average income groups, but a substantial fraction are well below average. And they are all doing worse relative to their metro area neighbors than they are to the national average.

Notice how it’s different from the Pew figure. In that, Vietnamese households had higher incomes than Koreans, and both were above the national average. Here Koreans are doing substantially better, mostly as a result of the household size adjustments. Also, the smaller groups I show – the ones Pew did not detail in that figure – are the poorer ones. And they are also doing worse locally relative to their national position.

Finally, consider the inequality within groups. Without doing a full-blown analysis of this, I can show the importance of the question with a simple box-and-whisker plot. This shows the distribution of income — adjusted as described above for household composition and size — for each group, including non-Asians for comparison.

The graph shows a lot of information in a small space:

  • The line through the middle of each box is the median, or mid point, of each income distribution.
  • The blue + sign is the mean. The further the mean is above the median, the more rich people there are pulling the mean up.
  • The top and bottom of the boxes are the 75th and 25th percentiles. The further apart they are, the greater the income gap between top and bottom.

(The top whiskers, which can be used to show the highest point in each distribution, aren’t shown here, because they’re so far away it would make the graph unreadable.)

As I mentioned at the top, the graph shows that Pakistanis and Chinese, and to a lesser extent Koreans and Indians, have high levels of inequality — their + signs are far from their median lines, and their 75/25 spreads are large. On the other hand, Filipinos, Laotians and Hmong have much narrower spreads.

Practically speaking, all this means that some groups are misrepresented by measures of the overall status of “Asians,” especially the smaller, poorer groups. And further, that generalizing will represent some groups worse than others because of their internal diversity. For example, the average Chinese American is quite a bit richer than the average non-Asian American, but the poorest 25% of Chinese are not much better off than the poorest 25% of the population at large.

Like I said, just an idea, with a few examples.

Take it away

Feel free to do it more, and/or better, yourself. Here’s my SAS code. Please credit me if it works, but don’t blame me if it’s wrong. This has not been peer-reviewed – it’s rough work product. Send any corrections written on the back of a $20-bill. (Everyone else: You can stop reading now!)

Just get these variables from IPUMS:


And then do this to them:

/* exclude households with no income */
if hhincome>0;
/* this codes folks into this scheme, with Asians from richest to poorest:
0="Not Asian"
1= "Japanese" 
2= "Indian" 
3= "Filipino" 
4= "Chinese" 
5= "Korean" 
6= "Vietnam" 
7= "Pakistani"
8= "Laotian"
9= "Cambodian"
10= "Hmong"
11= "OtherA"
12= "twoplusA" 
/* these codes refer to RACED, the detailed race variable on IPUMS */
/* Count asians as those who are asian alone, multiple asian, asian and white, asian and PI, or white-asian-PI */
if raced in (400 410 420 811 861 911) then asian=4;
if raced in (610 814) then asian=2;
if raced in (600 813 864 865 914) then asian=3;
if raced in (640 816) then asian=6;
if raced in (620 815) then asian=5;
if raced in (500 812) then asian=1;
if raced in (660) then asian=9;
if raced in (661) then asian=10;
if raced in (662) then asian=8;
if raced in (669) then asian=7;
if raced in ( 663 664 665 666 667 668 670 671 672 810 817 818 860 867 868 910 915) then asian = 11;
if raced in ( 673 674 675 676 677 678 679 819 869) then asian = 12;
/* so the variable labels display in output */
 ASIAN asian.
/* add the decimal to the weight variable */
format PERWT 11.2;
/* this counts up the number of kids and adults in each household */
proc sort data=temp; by serial; run;
data hh;
set temp (keep=serial age);
by serial;
if first.serial then do;
retain kids adults;
if age le 18 then do; kids=kids+1; end;
if age gt 18 then do; adults=adults+1; end;
keep serial kids adults;
if last.serial;
proc sort data=hh; by serial; run;
/* this merges in those people counts, and then calculates the household income variable */
data people;
merge temp hh; by serial;
equiv = hhincome/((adults+(.7*kids))**.65);
lnequiv = log(hhincome/((adults+(.7*kids))**.65));
/* this outputs the mean logged household equivalent income for each metro area (with non-metro folks as 0 */
proc means noprint data=people;
var lnequiv;
class metarea;
weight perwt;
output out=msa mean=msaequiv;
proc sort data=msa; by metarea; run;
proc sort data=people; by metarea; run;
/* this merges in the metro area variable and calculates the income-difference variable */
data merged;
merge people (in=a) msa;
by metarea;
if a;
relhhinc = lnequiv-msaequiv;
/* Distribution of the logged income variable */
proc univariate data=merged; var lnequiv; run;
proc univariate data=merged; var lnequiv; class asian; run;
/* Boxplots */
proc sort data=merged; by asian; run;
title 'Income distributions, household composition- and scale-adjusted';
proc boxplot data=merged;
 plot equiv*asian / clipfactor = 1.5 grid;
where asian le 10;
/* National income means */
proc means mean data=merged;
var lnequiv;
weight perwt;
/* National asian income means by group */
proc means mean missing data=merged;
var lnequiv; class asian; weight perwt;
/* Relative income for each Asian group, for metro people only */
proc means mean;
var relhhinc; class asian; weight perwt;
where asian >0 and metarea>0;


Filed under Me @ work, Research reports

That giant gobbling sound (is the 1% eating more and more of the cookies)

The Congressional Budget Office has a new report on trends in the income distribution. The big news is the 1%’s blitzkrieg assault on equality.

But it’s not just another rehash of Census numbers. Two adjustments they made seem especially good. First, they used a tricky matching method to combine Current Population Survey numbers (which do better at benefits and low-income households) combined with Internal Revenue Service data (which is better for high-end data). Second, they adjusted for household size and composition, and calculated distributions before and after taxes and transfers, and among different kinds of income.

The headline is the changing share of after-tax-and-transfer household income. Every group except the top 1% had a smaller share of income in 2007 than they did in 1979, or just an equal share in the case of the 81st-99th percentile group. That means the top quintile’s whole gain came in the top 1%.

That is very important. A source of outrage for the hundreds of thousands of Facebook users posting, commenting, or Liking Occupy Wall St. and its related pages.

It would be misleading, however, to view the chart as showing that incomes fell for the other groups. Income growth has been very skewed toward the top, but it is by no means confined to the top 1%. Here is my graph showing the income cutoffs for each quintile, and for the top slices separately. These are the bottom cutoffs in 1979 and 2007 (in inflation-adjusted dollars), with the percentage change in the backgrounded bars.

(Note there is no cutoff for the bottom quintile — the price of entry for that group is always $0).

Two thoughts about this.

1. Even if there were no 1%, if the graph only included the green bars, there would be plenty of increasing inequality for what might then be called “the 80%” to protest. The 81st-99th folks may be lucky to have the popular anger directed at the grotesque opulence of the sliver above them. (I’m not diminishing the 1%’s income gains, but as Matt Taibbi pointed out yesterday, the object of opposition is not just their income, but their influence.)

2. If you look at the families and networks of the top 1%, how many of them have relatives, friends, and even co-”workers” who are only in the top 10%? Would a self-respecting 1% family be appalled if their son married someone from a stable 5%-er family?

What I’m wondering is whether the 1% folks are merely a statistical convenience rather than a socially cohesive group (class?). That’s an empirical question that national income distributions can’t necessarily answer.

The CBO report is here, a summary is here, and the blog post version is here.


Filed under In the news

Little income distribution graph

From the department of unhelpful statistics today I read this:

“Recent estimates indicate that at the current rate it will take more than 800 years for the bottom billion of the world population to achieve 10% of global income.”

Seems like a shockingly slow rate of progress, since anything that takes 800 years is basically not happening. But the problem is with the juxtaposition of a big number (billion) with a small fraction (10%). A billion people isn’t that big a fraction of the population anymore. Actually, if we could ever get to that level of world inequality it would be great.

Since the bottom billion of the world is about 14% of the 7 billion people in the world, getting them 10% of the global income would be a very low level of inequality — they’d only be 4% away from a perfectly egalitarian world. In the United States now, for example, the bottom 14% of families only get about 3% of the income.

Incidentally, here’s that family distribution:

Leave a comment

Filed under In the news

What it’s all worth, in work-life cash

A Census Bureau research report estimates lifetime earnings by education, race/ethnicity and gender.

The report, by the Bureau’s Tiffany Julian and Robert Kominski, uses national data from the American Community Survey to create “synthetic work-life estimates” of earnings.

The method takes earnings information from one time period — in this case the years 2006-2008 combined (before the recession) — and calculates how much money people would make if they lived through their whole work lives (40 years, from age 25 to 64) during that period. Demographers use the same method to estimate life expectancy. It’s a way of using the most current period to project an image of the future in today’s shape. It’s a better look at the future, for most purposes, than looking back at the lives of people who are wrapping it up today.

Here is a figure they made, using earnings from people working full-time and year round:

That is for people working full-time and year-round at their jobs. That is not reasonable, of course, if people take time out of the labor force, or out of full-time work. So this understates the earnings gaps, especially by gender, since women take more time out of the labor force than men, on average.

They also reported the projected lifetime earnings for all workers — including those working only part-time or part of the year. The figure above showed a ratio of 4.7-to-1 from top to bottom, whereas the all-worker data has a ratio of 5.6-to-1 from White male professional-degree holders to Latina high school graduates.

I turned their all-worker table into this graph with men and women color coded:

This is not a real prediction, just a projection of the present into the future. But the scale is good for the imagination — the gap from top to bottom is 3.65 million dollars in 2008 terms.

Note that in addition to employer discrimination, these gaps reflects the full range of influences on people’s earnings, including sorting into occupations, part-time work, lost tenure and experience from time out of the labor force, and regional variation (which is one reason Asian workers show up high – many live in expensive cities like San Francisco and Honolulu).


Filed under Research reports

Is it a “marriage problem”?

A self-described liberal (Andrew Cherlin) and conservative (W. Bradford Wilcox) pair of academics have produced a “policy brief”* for the Brookings Institution entitled, The Marginalization of Marriage in Middle America.

There’s no new information or analysis in the report, so I won’t dwell on it. But I’d like to use it to point out a logical problem with pro-marriage social science in general. Here’s an excerpt from the introduction, with my comment following:

This policy brief reviews the deepening marginalization of marriage and the growing instability of family life among moderately-educated Americans: those who hold high school degrees but not four-year college degrees and who constitute 51 percent of the young adult population (aged twenty-five to thirty-four). … [b]oth of us agree that children are more likely to thrive when they reside in stable, two-parent homes. … Thus, we conclude by offering six policy ideas, some economic, some cultural, and some legal, designed to strengthen marriage and family life among moderately-educated Americans. … To be sure, not every married family is a healthy one that benefits children. Yet, on average, the institution of marriage conveys important benefits to adults and children. … The fact is that children born and raised in intact, married homes typically enjoy higher quality relationships with their parents, are more likely to steer clear of trouble with the law, to graduate from high school and college, to be gainfully employed as adults, and to enjoy stable marriages of their own in adulthood. Women and men who get and stay married are more likely to accrue substantial financial assets and to enjoy good physical and mental health. In fact, married men enjoy a wage premium compared to their single peers that may exceed 10 percent. At the collective level, the retreat from marriage has played a noteworthy role in fueling the growth in family income inequality and child poverty that has beset the nation since the 1970s. For all these reasons, then, the institution of marriage has been an important pillar of the American Dream, and the erosion of marriage in Middle America is one reason the dream is increasingly out of reach for men, women, and children from moderately-educated homes.

It’s obvious empirically that adults and children in married-couple families, on average, are doing better on many measures than those not in such families. The logical problem is when people conclude from this pattern that the obvious response is to “strengthen marriage and family life.” But, why not try to reduce that disparity instead?

This is the logical equivalent of the Republican mantra that “We don’t have a revenue problem in Washington; we have a spending problem.” That’s only true if you’re doing one-handed math. And the same holds for marriage.

Yes, there is less marriage, and many people are less well off without it. Does that mean we have a “marriage” problem, or a family inequality problem? Is there any other way to help people develop high quality relationships with their parents, complete more education, get better jobs, accrue financial assets and maintain good physical and mental health?

In the categorical math of inequality, you can try (with little chance of success in this case) to reduce the number of people in the disadvantaged category (non-married families), or you can try to reduce the size of the disparity between the two categories.

*I’m not sure, but I think a “policy brief” is a blog post about policy matters, produced on the PDF letterhead of a foundation. Not that there’s anything wrong with that. As far as I can tell, this one is a non-peer-reviewed essay which handles sourcing like this: “the findings detailed in this policy brief come from a new report by Wilcox, When Marriage Disappears: The New Middle America.” As I’ve pointed out (here andhere), Wilcox’s reports at the National Marriage Project are also non-peer-reviewed essays with a lot of substantially misleading and erroneous content.














Filed under Me @ work