Tag Archives: inequality

Global inequality, within and between countries

Most of the talk about income inequality is about inequality within countries – between rich and poor Americans, versus between rich and poor Swedes, for example. The new special issue of Science magazine about inequality focuses that way as well, for example with this nice figure showing inequality within countries around the world.

But what if there were no income inequality within countries? If everyone within each country had the same income, but we still had rich and poor countries, how unequal would our world be? It turns out that’s an easy question to answer.

Using data from the World Bank on income for 131 countries, comprising 91% of the world population, here is the Lorenz curve showing the distribution of gross national income (GNI) by population, with each person in each country assumed to have the same income (using the purchasing power parity currency conversion). I’ve marked the place of the three largest countries: China, India, and the USA:

The Gini index value for this distribution is .48, which means the area between the Lorenz curve and the blue line – representing equality, is 48% of the lower-right triangle. (Going all the way to 1.0 would mean one person had all the money.)

But there is inequality within countries. In that Science figure the within-country Ginis range from .24 in Belarus to .67 in South Africa. (And that’s using after-tax household income, which assumes each person within each household has the same income. So there’s that, too.)

The World Bank data I’m using includes within-country income distributions broken into 7 quantiles: 5 quintiles (20% of the population each), with the top and bottom further broken in half. If I assume that the income is shared equally within each of these quantiles, I can take those 131 countries and turn them into 917 quantiles (just assigning each group its share of the country’s GNI). These groups range in average income from \$0 (due to rounding) in the bottom 10th of Bolivia and Guyana, or \$43 per person in the bottom 10th of the Democratic Rep. of Congo, up to \$305,800 per person in the top 10th of Macao.

To illustrate this, here are India, China, and the USA, showing average incomes for the quantiles and the countries as a whole:

This shows that the average income of China’s top 10th is between the second and third quntiles of the US income distribution, and the top 10th of India has an average income comparable to the US 10-19th percentile range. Obviously, this breakdown shows a lot more inequality.

So here I add the new Lorenz curve to the first figure, counting each of those 917 quantiles as a separate group with its own income:

Now the Gini index has risen a neat 25%, to an even .60. Is that a big difference? Clearly, between country inequality — the red line — is vast. If every country were a household, the world would be almost as unequal as Nigeria. In this comparison, you could say you get 80% of the income inequality to show up just looking at whole countries. But of course even that obscures much more, especially at the high end, where there is no limit.

Years ago I followed the academic debate over how to measure inequality within and between countries. If I were to catch up with it again, I would start with this article, by my friends Tim Moran and Patricio Korzeniewicz. That provoked a debate over methods and theory, and they eventually published this book, which argues: “within-country analyses alone have not adequately illuminated our understanding of global stratification.” There is a lot more to read, but their work, and the critiques they’re received, is a good place to start.

Note: I have put my Excel worksheet for this post here. It has the original data and my calculations, but not the figures.

4 Comments

Filed under Uncategorized

Education, not income, drives Piketty searches

Proving once again that effort is not always correlated with income, I present this critique of a Justin Wolfers blog post…

A lot of people have written reviews of Piketty. The first few pages of a Google search revealed all these (I added Heather Boushey, who wrote a good one)*:

I believe that is diversity, because every human being is different.

Anyway, where to begin? Justin Wolfers wrote a little post, not a review, but it caught my attention. The headline of was, “Piketty’s Book on Wealth and Inequality Is More Popular in Richer States.” Distractable, that’s where I began.

Wolfers’ culminating line, “Vive la révolution!”, suited Scott Winship, who looked over Wolfer’s figures before sniping, “the buzz around the book has come mostly from rich liberal states along the Boston-to-Washington corridor.” But I think they’re both misinterpreting.

According to the Google search data Wolfers used, these were the top 10 states for “piketty” searches (Washington, D.C. excluded): Massachusetts, New York, Connecticut, Maryland, New Jersey, Illinois, Pennsylvania, Wisconsin, Oregon, California.

It looks to me that it’s actually education driving the search data. And that is a big difference. Let me explain.

Do data?

Microsoft Word tells me that the reading grade level of the publisher’s excerpt is 16.3, so it takes a 16th-grade education to read it. (Note that the “Boston-to-Washington corridor,” which was supposed to sound like a small sliver of the country, has 26% of the country’s college graduates.) So consider income versus college completion, which we can now take as a proxy for being able to read Piketty.

Wolfers writes, “I can’t tell you where Piketty has been least popular, because below a certain level of search activity, Google doesn’t release the actual numbers.” So he proceeds to leave 24 states out of his analysis (this will become important). Using per-capita income (converted to z-scores), and dropping 24 states plus the ridiculous outlier of DC, this is Wolfers’ income result (my calculations; he just showed scatter plots):

OK, leaving out the bottom half of the Piketty distribution, there is a strong positive relationship between per capita income and Piketty Google searches. Congratulations, you can have three jobs as an economist!

I kid Wolfers. But, come on! I don’t know what kind of data operation they’re running over there at the Upshot, but I would expect Wolfers to take it up a notch. First, control for college completion (percent of folks ages 25+ with a BA or more, also z-scored). See how it shows… oops:

The income effect is reduced but the education effect isn’t significant. (See how I showed you that instead of just going right to the results that support my argument?)

But go back to Wolfers leaving out the bottom half of the Piketty distribution. What’s wrong with that? I’m sure there’s some statistical way of explaining that, but just eyeballing it you’d have to say dropping those cases could cause trouble. The censored cases all have values of -.64 on the search variable. The relationship with income is weaker when the censored cases are included (shown in the red line) versus when he limits it to the top half of Piketty states (blue line):

What to do about this? An easy thing is just to include the censored cases at their values of -.64, just pretending -.64 is a legitimate value. That gives:

Now the income effect is reduced about three-quarters, and the college completion effect is three-times as large (with a t-stats to match).

But that’s not the best way to handle this. If only economists had invented a way of modeling data with censored dependent variables! Just kidding: there’s Tobin’s Tobit. This kind of model says, I see your censored dependent variable, and I crash it through the bottom of the distribution as a function of its linear relationship to your independent variables. So instead of all being -.64, it lets the censored cases be as low as they want to be, with values predicted by income and college completion. Sort of. Anyway, here’s that result:

Now income is crushed, reduced to literal insignificance. What matters is the percentage of the population that has completed college. It’s not that rich people like Piketty, it’s that college graduates do. Maybe because that’s who can read it. (I don’t know, I haven’t tried.)

What do economists read?

Of course, mine and Wolfers’ are both pretty crude analyses. There are only two reasons his was published on a major news site and mine was buried over here on an obscure sociology blog: (a) he writes for a major news site, and (b) his weak analysis lends itself to an emerging snarky narrative in which rich leftists are seen to whine about inequality but real people can’t be bothered (the main point of Winship’s review) — just reinforcing the echo-chamber model of knowledge consumption that people who are into “data-driven” news like to appear to have risen above.

For a real explanation, Wolfers (and Winship) need look no further than the rest of the Google Correlate results page to see the obvious fact that searches for Piketty are simply correlated with interest in economics. Here’s the search that is most highly correlated with searches for “piketty” across U.S. states: “world bank gdp” (r=.98):

Here are some other searches correlated with “piketty” at .94 or higher:

economic consulting firms
eu data protection
exchange rate data
gdp by sector
inflation target
journal of labor economics
london school economics
nber working paper
oecd statistics
oxford economics
panel data stata
stock market capitalization
the economist intelligence unit
us current account deficit
world bank statistics

Well, there goes your rich, liberal, “American left” theory of who’s driving the Piketty phenomenon. It might be true, but it’s not confirmed by the Google search data. My hot new theory: college educated people who are also interested in economics are disproportionately interested in Piketty.

* The reviewer pool: Mervyn King (The Telegraph), Paul Krugman (New York Review of Books), Tyler Cowen (Foreign Affairs), James K. Galbraith (Dissent), Daniel Schuchman (Wall Street Journal), Justin Fox (Harvard Business Review), Michael Tanner (National Review), John Cassidy (New Yorker), Martin Wolf (Financial Times), Jordan Weissmann (Slate), Steven Pearlstein (Washington Post), Scott Winship (National Review), Heather Boushey (Challenge)

3 Comments

Filed under Uncategorized

How well do teen test scores predict adult income?

Now with new figures and notes added at the end — and a new, real life headline and graph illustrating the problem in the middle!

The short answer is, pretty well. But that’s not really the point.

In a previous post I complained about various ways of collapsing data before plotting it. Although this is useful at times, and inevitable to varying degrees, the main danger is the risk of inflating how strong an effect seems. So that’s the point about teen test scores and adult income.

If someone told you that the test scores people get in their late teens were highly correlated with their incomes later in life, you probably wouldn’t be surprised. If I said the correlation was .35, on a scale of 0 to 1, that would seem like a strong relationship. And it is. That’s what I got using the National Longitudinal Survey of Youth. I compared the Armed Forces Qualifying Test scores, taken in 1999, when the respondents were ages 15-19 with their household income in 2011, when they were 27-31.*

Here is the linear fit between between these two measures, with the 95% confidence interval shaded, showing just how confident we can be in this incredibly strong relationship:

That’s definitely enough for a screaming headline, “How your kids’ test scores tell you whether they will be rich or poor.”

In fact, since I originally wrote this, the Washington Post Wonkblog published a post with the headline, “Here’s how much your high school grades predict your future salary,” with this incredibly tidy graph:

No doubt these are strong relationships. My correlation of .35 means AFQT explains 12% of the variation in household income. But take heart, ye parents in the age of uncertainty: 12% of the variation leaves a lot left over. This variable can’t account for how creative your children are, how sociable, how attractive, how driven, how entitled, how connected, or how White they may be. To get a sense of all the other things that matter, here is the same data, with the same regression line, but now with all 5,248 individual points plotted as well (which means we have to rescale the y-axis):

Each dot is a person’s life — or two aspects of it, anyway — with the virtually infinite sources of variability that make up the wonder of social existence. All of a sudden that strong relationship doesn’t feel like something you can bank on with any given individual. Yes, there are very few people from the bottom of the test-score distribution who are now in the richest households (those clipped by the survey’s topcode and pegged at 3 on my scale), and hardly anyone from the top of the test-score distribution who is now completely broke.

But I would guess that for most kids a better predictor of future income would be spending an hour interviewing their parents and high school teachers, or spending a day getting to know them as a teenager. But that’s just a guess (and that’s an inefficient way to capture large-scale patterns).

I’m not here to argue about how much various measures matter for future income, or whether there is such a thing as general intelligence, or how heritable it is (my opinion is that a test such as this, at this age, measures what people have learned much more than a disposition toward learning inherent at birth). I just want to give a visual example of how even a very strong relationship in social science usually represents a very messy reality.

Post-publication addendums

1. Prediction intervals

I probably first wrote about this difference between the slope and the variation around the slope two years ago, in a futile argument against the use of second-person headlines such as “Homophobic? Maybe You’re Gay.” Those headlines always try to turn research into personal advice, and are almost always wrong.

Carter Butts, in personal correspondence, offered an explanation that helps make this clear. The “you” type headline presents a situation in which you — the reader — are offered the chance to add yourself to the study. In that case, your outcome (the “new response” in his note) is determined by the both the line and the variation around the line. Carter writes:

the prediction interval for a new response has to take into account not only the (predicted) expectation, but also the (predicted) variation around that expectation. A typical example is attached; I generated simulated data (N=1000) via the indicated formula, and then just regressed y on x. As you’d expect, the confidence bands (red) are quite narrow, but the prediction bands (green) are large – in the true model, they would have a total width of approximately 1, and the estimated model is quite close to that. Your post nicely illustrated that the precision with which we can estimate a mean effect is not equivalent to the variation accounted for by that mean effect; a complementary observation is that the precision with which we can estimate a mean effect is not equivalent to the accuracy with which we can predict a new observation. Nothing deep about that … just the practical points that (1) when people are looking at an interval, they need to be wary of whether it is a confidence interval or a prediction interval; and (2) prediction interval can (and often should be) wide, even if the model is “good” in the sense of being well-estimated.

And here is his figure. “You” are very likely to be between the green lines, but not so likely to be between the red ones.

2. Random other variables

I didn’t get into the substantive issues, which are outside my expertise. However, one suggestion I got was interesting: What about happiness? Without endorsing the concept of “life satisfaction” as measured by a single question, I still think this is a nice addition because it underscores the point of wide variation in how this relationship between test scores and income might be experienced.

So here is the same figure, but with the individuals coded according to how they answered the following question in 2008, when they were age 24-28, “All things considered, how satisfied are you with your life as a whole these days? Please give me an answer from 1 to 10, where 1 means extremely dissatisfied and 10 means extremely satisfied.” In the figure, Blue is least satisfied (1-6; 21%), Orange is moderately satisfied (7-8; 46%), and Green is most satisfied (9-10; 32%)

Even if you squint you probably can’t discern the pattern. Life satisfaction is positively correlated with income at .16, and less so with test scores (.07). Again, significant correlation — not helpful for planning your life.

* I actually used something similar to AFQT: the variable ASVAB, which combines tests of mathematical knowledge, arithmetic reasoning, word knowledge, and paragraph comprehension, and scales them from 0 to 100. For household income, I used a measure of household income relative to the poverty line (adjusted for household size), plus one, and transformed by natural log. I used household income because some good test-takers might marry someone with a high income, or have fewer people in their households — good decisions if your goal is maximizing household income per person.

6 Comments

Filed under Me @ work

How to illustrate a .61 relationship with a .93 figure: Chetty and Wilcox edition

Yesterday I wondered about the treatment of race in the blockbuster Chetty et al. paper on economic mobility trends and variation. Today, graphics and representation.

If you read Brad Wilcox’s triumphalist Slate post, “Family Matters” (as if he needed “an important new Harvard study” to write that), you saw this figure:

David Leonhardt tweeted that figure as “A reminder, via [Wilcox], of how important marriage is for social mobility.” But what does the figure show? Neither said anything more than what is printed on the figure. Of course, the figure is not the analysis. But it is what a lot of people remember about the analysis.

But the analysis on which it is based uses 741 commuting zones (metropolitan or rural areas defined by commuting patterns). So what are those 20 dots lying so perfectly along that line? In fact, that correlation printed on the graph, -.764, is much weaker than what you see plotted on the graph. The relationship you’re looking at is -.93! (thanks Bill Bielby for pointing that out).

In the paper, which presumably few of the people tweeting about it read, the authors explain that these figures are “binned scatter plots.” They broke the commuting zones into equally-sized groups and plotted the means of the x and y variables. They say they did percentiles, which would be 100 dots, but this one only has 20 dots, so let’s call them vigintiles.

In the process of analysis, this might be a reasonable way to eyeball a relationship and look for nonlinearities. But for presentation it’s wrong wrong wrong.* The dots compress the variation, and the line compresses it more. The dots give the misleading impression that you’re displaying the variance around the line. What, are you trying save ink?

Since the data are available, we can look at this for realz. Here is the relationship with all the points, showing a much messier relationship, the actual -.76 (the range of the Chetty et al. figure, which was compressed by the binning, is shown by the blue box):

That’s 709 dots — one for each of the commuting zones for which they had sufficient data. With today’s powerful computers and high resolution screens, there is no excuse for reducing this down to 20 dots for display purposes.

But wait, there’s more. What about population differences? In the 2000 Census, these 709 commuting zones ranged in population in the 2000 Census from 5,000 (Southwest Jackson, Utah) to 16,000,000 (Los Angeles). Do you want to count Southwest Jackson as much as Los Angeles in your analysis of the relationship between these variables? Chetty et al. do in their figure. But if you weight them by population size, so each person in the population contributes equally to the relationship, that correlation that was -.76 — which they displayed as -.93 — is reduced to -.61. Yikes.

Here is what the plot looks like if you scale the commuting zones according to population size (more or less, not quite sure how Stata does this):

Now it’s messier, and the slope is much less steep. And you can see that gargantuan outlier — which turns out to be the New York commuting zone, which has 12 million people and with a lot more upward mobility than you would expect based on its family structure composition.

Finally, while we’re at it, we may as well attend to that nonlinearity that has been apparent since the opening figure. We can increase the variance explained from .38 to .42 by adding a quadratic term, to get this:

I hate to go beyond what the data can really tell. But — what the heck — it does appear that after 33% single-mother families, the effect hits its minimum and turns positive. These single mother figures are pretty old (when Chetty et al.’s sample were kids). Now that the country has surpassed 40% unmarried births, I think it’s safe to say we’re out of the woods. But that’s just speculation.**

*OK, OK: “wrong wrong wrong” is going too far. Absolute rules in data visualization are often wrong wrong wrong. Binning 709 groups down to 20 is extreme. Sometimes you have a zillion points. Sometimes the plot obscures the pattern. Sometimes binning is an inherent part of measurement (we usually measure age in years, for example, not seconds). None of that is an excuse in this case. However, Carter Butts sent along an example that makes the point well:

On the other hand, the Chetty et al. case is more similar to the following extreme example:

If you were interested in the relationship between age and earnings for a sample of 1,400 full-time, year-round women, you might start with this, which is a little frustrating:

The linear relationship is hard to see, but it’s about +\$500 per year of age. However, the correlation is only .13, and the variance explained by linear-age alone is only 1.7%. But if you plotted the mean wage over ages, the correlation jumps to .68:

That’s a different question. It’s not, “how does age affect earnings,” it’s, “how does age affect mean earnings.” And if you binned the women into 10-year age intervals (25-34, 35-44, 45-54), and plotted the mean wage for each group, the correlation is .86.

Chetty et al. didn’t report the final correlation, but they showed it, even adding the regression line, so that Wilcox could call it the “bivariate relationship.”

**This paragraph was a joke that several people missed, so I’m clarifying. I would never draw a conclusion like that from the scraggly tale of a loose correlation like this.

11 Comments

Filed under Research reports

Where is race in the Chetty et al. mobility paper?

What does race have to do with mobility? The words “race,” “black,” or “African American” don’t appear in David Leonhardt’s report on the new Chetty et al. paper on intergenerational mobility that hit the news yesterday. Or in Jim Tankersley’s report in the Washington Post, which is amazing, because it included this figure: That’s not exactly a map of Black America, which the Census Bureau has produced, but it’s not that far off:

But even if you don’t look at the map, what if you read the paper? Describing the series of maps of intergenerational mobility, the authors write:

Perhaps the most obvious pattern from the maps in Figure VI is that intergenerational mobility is lower in areas with larger African-American populations, such as the Southeast. … Figure IXa confirms that areas with larger African-American populations do in fact have substantially lower rates of upward mobility. The correlation between upward mobility and fraction black is -0.585. In areas that have small black populations, children born to parents at the 25th percentile can expect to reach the median of the national income distribution on average (y25;c = 50); in areas with
large African-American populations, y25;c is only 35.

Here is that Figure IXa, which plots Black population composition and mobility levels for groups of commuting zones: Yes, race is an important part of the story. In a nice part of the paper, the authors test whether Black population size is related to upward mobility for Whites (or, people in zip codes that are probably White, since race isn’t in their tax records), and find that it is. It’s not just Blacks driving the effect. I’m thinking about the historical patterns of industrial development, land ownership, the backwardness of racist elites in the South, and so on. But they’re not. For some reason, not explained at all, Chetty et al. offer this pivot:

The main lesson of the analysis in this section is that both blacks and whites living in areas with large African-American populations have lower rates of upward income mobility. One potential mechanism for this pattern is the historical legacy of greater segregation in areas with more blacks. Such segregation could potentially affect both low-income whites and blacks, as racial segregation is often associated with income segregation. We turn to the relationship between segregation and upward mobility in the next section.

And that’s it, they don’t discuss Black population size again, instead only focusing on racial segregation. They don’t pursue this “potential mechanism” in the analysis that follows. Instead, they drop percent Black for racial segregation. I have no idea why, especially considering this Table VII, which shows unadjusted (and normalized) correlations (more or less) between each variable and absolute upward mobility (the variable mapped above):

In these normalized correlations, fraction Black has a stronger relationship to mobility than racial segregation or economic segregation! In fact, it’s just about the strongest relationship on the whole long table (except for single mothers, with which it is of course highly correlated). So why do they not use it in their main models? Maybe someone else can explain this to me. (Full disclosure, my whole dissertation was about this variable.)

This is especially unfortunate because they do an analysis of the association between commuting zone family structure (using macro-level variables) and individual-level mobility, controlling for marital status — but not race — at the individual level. From this they conclude, “Children of married parents also have higher rates of upward mobility if they live in communities with fewer single parents.” I am quite suspicious that this effect is inflated by the omission of race at either level. So they write the following, which goes way beyond what they can find in the data:

Hence, family structure correlates with upward mobility not just at the individual level but also at the community level, perhaps because the stability of the social environment affects children’s outcomes more broadly.

Or maybe, race.

I explored the percent Black versus single mother question in a post a few weeks ago using the Chetty et al. data. I did two very simple OLS regression models using only the 100 largest commuting zones, weighted for population size, the first with just single motherhood, and then a model with proportion Black added: This shows that the association between single motherhood rates and immobility is reduced by two-thirds, and is no longer significant at conventional levels, when percent Black is added to the model. That is: Percent Black statistically explains the relationship between single motherhood and intergenerational immobility across U.S. labor markets. That’s not an analysis, it’s just an argument for keeping percent Black in the more complex models. Substantively, the level of racial segregation is just one part of the complex race story — it measures one kind of inequality in a local area, but not the amount of Black, which matters a lot (I won’t go into it all, but here are three old papers: one, two, three.

The burgeoning elite conversation about economic mobility, poverty, and inequality is good news. It’s avoidance of race is not.

14 Comments

Filed under Research reports

Mystery solved? Why “women in their 20s” earn more

When pundits like David Brooks get sucked into the factoid-warp of Hanna Rosin (The End of Men) and Liza Mundy (The Richer Sex), they are always floored by the idea that young women earn more than young men. To them this represents the future. And woe to any woman trying to convince a jury she’s being discriminated against while these books are in the headlines. Brooks spelled it out real simple: “Women in their 20s outearn men in their 20s.”

That’s easily shown to be wrong (still holding my breath for the correction). But the more detailed factoid, the one you get in the long-soundbite version of the end-of-history, is that “median full-time wages for single childless women ages 22-30 exceeds those of single childless men in the same age group,” as reported in USA Today, for example. That was calculated by Reach Advisors using the American Community Survey.*

Making broad conclusions based on weird data slices is bad practice. And this is a great case study in why.

Who are those full-time working, not-married and childfree 20-somethings in metro areas? I ran that filter over the 2010 ACS data available from IPUMS, and this jumped out:

OK, so for whatever reason, notice that this group includes a disproportionate share of White women and Latino men. That turns out to be pivotal, since these particular Latino men have very low earnings. Check the earnings by race/ethnicity and gender:

So that’s it. The overall \$1,000 advantage for women (seen in the bars on the far right) is the result of these particular Latino men’s low earnings. The high earnings of these White women are important, of course, they’re just not higher than White men’s. If you just look at Whites or Blacks there is no advantage for women.

I am all for getting into the problem of Latino men’s (and women’s) low average earnings. But that’s not where this story has been going. More than anything this is just shoddy statistical cherry-picking.

Hey media mega-conglomerates: give that meme a rest!

* Reach Advisors also limited the analysis to metro areas, so I did that as well. I don’t get as big an advantage for “women” as that reported in that 2010 USA Today article, which said it was based on 2008 data (they got an 8% gap, I get 3%). I don’t care to figure out exactly the source of the differences (and Reach hasn’t published their code).

14 Comments

Filed under In the news, Me @ work

Quick book review: The Price of Inequality

The Price of Inequality: How Today’s Divided Society Endangers Our Future, by Joseph E. Stiglitz (W. W. Norton, 2012)

My economics training as a sociologist — with a background in American Culture studies — has been spotty and roundabout. I got a healthy dose of Marxist economics in college, and then some feminist economics, a little human capital theory and some dated econometrics in grad school and since.

All that made reading made it interesting, and also frustrating, to read The Price of Inequality, by Joseph Stiglitz – a winner of the Nobel Prize for economics and an “insanely great economist,” according to Paul Krugman.

On the plus side, I am glad to see someone within mainstream economic theory freely discussing all the ways that common assumptions simply do not predominate in the modern economic scene. Especially helpful in this category is his discussion of how “rents” accumulate vast resources at the upper end of the income distribution, with perverse effects on economic development and politics alike. At the very top — in the finance sector especially, but also in energy and big manufacturing — there is nothing like free-market competition. And the beneficiaries of those distortions are the most powerful players in the economy and political system.

It is refreshing to see this concentration of wealth described as waste and distortion, as their vast profits provide little gain to anyone else. In fact, dumping vast wealth on the 1% creates a drag on the macroeconomy while fueling the historic run-up in economic inequality. This is all very timely and takes you right through the financial crisis up to early 2012.

So if you want to understand from an economic perspective how “the market” in America isn’t the way it’s supposed to be, this book may be for you.

Top 1% income shares, including capital gains, for the U.S. and Sweden. From the World Top Incomes Database.

The other good thing about the book for many readers will be its cogent and comprehensive economic rationale for the liberal reforms that many of you probably supported already. Stiglitz makes the case that a suite of reforms – an agenda Rachel Maddow, Elizabeth Warren and Robert Reich probably agree on – would, by (directly or indirectly) increasing taxes (or reducing subsidies) on the wealthy and redistributing wealth downward, reduce the federal debt, increase economic growth, and reduce economic inequality all at the same time.

Round numbers: if the richest 1% earn about 20% of all income, then taxing them another 10% would generate government revenue equivalent to 2% of GDP. (And it wouldn’t hurt anything, since they just hoard or waste their extra cash anyway rather than “creating jobs” with it, and they’re so greedy they wouldn’t be discouraged by the disincentive effect of higher taxes.) That’s an amount of money that could actually be useful for poor people.

The frustration I feel reading the book is more amorphous. I think there have to be better ways of describing this whole system than using the language of mainstream economics, which ends up painting a picture of an entire system that does not work according to the rules as imagined. Concepts like power, social class, social networks, elites and reification do not figure heavily in this story. In fact, Stiglitz’s apparent ignorance of sociology is sometimes funny as in this passage:

Social sciences like economics differ from the hard sciences in that beliefs affect reality: beliefs about how atoms behave don’t affect how Adams actually behave, but beliefs about how the economic system functions affect how it actually functions. George Soros, the great financier, has referred to this phenomenon has “reflexivity,” and his understanding of it may have contributed to his success.

I guess after what people like me have made of econometrics it’s only fair that economists would attribute the idea of reflexivity to Soros. (The discussion of reflexivity in Anthony Giddens’s book The Consequences of Modernity is very approachable.)

Anyway, the book is easy to read and informative, and has lots of footnotes and references.

2 Comments

Filed under Research reports

Do Asians in the U.S. have high incomes?

The Pew Research Center last week released a lengthy research report on Asians in the U.S., titled “The Rise of Asian Americans.” It combines information from the Census and government sources with the results of Pew’s own national survey of attitudes and opinions.

The report has lots of good information, but there are some thorny problems here. I’ll describe a few problems, then offer one data exercise to help clarify. This gets technical and it’s long, so I will give you the substantive conclusion at the top:

1. Because Asians are a diverse category made up of groups with very different profiles, and their household composition and geographic distribution vary by national origin group, generalizations are often unhelpful.
2. Among the 10 largest Asian groups, five (Japanese, Indian, Chinese, Filipino, Korean) are above average in income and five (Vietnamese, Pakistani, Laotian, Cambodian, Hmong) are below. But all 10 Asian groups are doing better compared to the national average than they are compared to the average incomes in the places they live — they are richer nationally than they are locally.
3. The amount of income inequality within Asian groups varies as well. Pakistanis,  Chinese, Koreans and Indians have the highest levels of inequality, while Filipinos and Laotians have low levels of inequality.

Details follow.

But first: Who is Asian? On the Census questionnaire, Asian is not exactly a category – rather, the category is created from all the responses of people who specify Asian national origins in the race question. To refresh, this is the question:

So “Asian” is all the people who specify Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese or “Other Asian.” (The right-hand column is for Pacific Islanders.) Yes, in the U.S., Hispanic/Latino national origins are “ethnicities,” but Asian national origins are “races.” Go figure.

That lack of a common definition is compounded by two factors: First, there is so much diversity among Asians that the using a single category is as challenging statistically as it is politically. And second, Asians – as the Pew report shows – have a high rate of intermarriage with Whites, as well as (among some groups) across Asian national-origin lines. As a result, some Asian groups have high rates of “multiple-race identification” — especially those whose immigration was generations ago.

The controversy over the Pew report is summarized in this Color Lines story and this response from the Asian American / Pacific Islander Policy Research Consortium. The gist of it is that the report was too rosy in its description of Asian advantages and too homogenizing in its treatment of Asian diversity – as a result repeating the “divisive trope” of the “model minority.” Here’s part of the summary from the New York Times:

Drawing on Census Bureau and other government data as well as telephone surveys from Jan. 3 to March 27 of more than 3,500 people of Asian descent, the 214-page study found that Asians are the highest-earning and best-educated racial group in the country.

Among Asians 25 or older, 49 percent hold a college degree, compared with 28 percent of all people in that age range in the United States. Median annual household income among Asians is \$66,000 versus \$49,800 among the general population.

In the survey, Asians are also distinguished by their emphasis on traditional family mores. About 54 percent of the respondents, compared with 34 percent of all adults in the country, said having a successful marriage was one of the most important goals in life; another was being a good parent, according to 67 percent of Asian adults, compared with about half of all adults in the general population.

Asians also place greater importance on career and material success, the study reported, values reflected in child-rearing styles. About 62 percent of Asians in the United States believe that most American parents do not put enough pressure on their children to do well in school.

Did Pew homogenize or glorify too much? I don’t know. Here’s a graph from the report, which shows that Asian groups differ, but they all have higher-than-average household incomes:

The Color Lines story quotes Deepa Iyer, head of the National Council of Asian Pacific Americans and executive director of South Asian Americans Leading Together:

The danger in framing the study the way Pew did, and the way the media picked up on it, is that folks who are in the general public and institutional stakeholders and policy makers might get the impression that they don’t necessarily need to dig deep into our communities to understand any sort of disparities that exist.

The problem of homogenizing Asians is longstanding in American sociology. In most data analyses, the Asian sample is small to begin with, so they are often collapsed into one category (which I’ve done) or dropped from the story (which I’ve also done, angering some readers). Here is a typical passage, from a 2001 article by Leslie McCall:

That didn’t stop her (or lots of other people) from extensively analyzing Asians as a combined group, and offering speculation on her results.

There are other examples. In my experience, Jen’nan Read and I broke out six Asian groups for a study of women’s employment with the 2000 Decennial Census data — which reinforced my conviction that disaggregating is best. (This 2010 Census report gives some detail on more than 20 national-origin groups.)

Some new numbers

Anyway, I’ve got four specific issues to address with Pew’s comparison of household incomes (some of which they acknowledge in the report): a) Household composition differs between groups (more or fewer kids, grandparents); b) Asians disproportionately live in parts of the U.S. with high costs of living (like Hawaii and California, and urban areas generally); c) different members of a household might have different “race” identities (so, a Korean man married to a Chinese woman might define their child is either or both); and d), levels of inequality differ between groups, so central tendency comparisons don’t capture the whole story.

In this exercise I address these problems. I adjust for household size and composition, count individuals’ own “race” rather than imposing a single identity on the household, compare incomes to the average in the local metropolitan area as well as the national average, and compare levels of within-group inequality.

All in one blog post! Someone might want to work this up into a real paper (and maybe someone else already has? The last time I really read about this was more than 10 years ago.) So I’m just offering this approach as a suggestion, and making my code available if anyone wants to pursue it (see below).

I use the 2006-2010 combined American Community Survey, from IPUMS, for maximum recent sample size. This is about 15 million people, and the Asian samples range from about 160,000 Chinese to 7,500 Laotians. I identify individuals according to their individual “race.”

I calculate their incomes as per capita household income, adjusted for economies of scale. To do that, I count adults as 1 person, kids under 18 as .7 of a person, and divide the total household income by that count to the power of .65 for economies of scale (see here for details). Then I take the natural log of all that to pull in the right tail of the distribution (so the mean isn’t pulled up by the ~1%). When I’m done, everyone in the household has the same income, and the distribution is pretty normal. Nice!

To see what this does: The mean household income for individuals in the country in 2006-2010 is \$79,174, and the natural log of the composition-and-scale adjusted per capita income is 10.26 (see figure), which works out to \$28,439. In comparison, the logged incomes for Asians range from 10.6 (~\$40,000) for Indians and Japanese, down to 9.7 (~\$16,000) for Hmong.

To deal with the issue of living in expensive areas, I take the mean of that logged income in each metropolitan area, and compare each person’s own per capita income to that. So a score of 0 means you have the average income in your area — more than 0 means richer than average, less than zero is poorer.

There is not one correct answer about how to do this: Having an average income in a rich area still means you can buy more stuff on Amazon than someone with a lower absolute income. But it might also mean having a smaller house, or not being considered rich by your neighbors. On the third hand, if a rich family moves to a rich area, we shouldn’t feel sorry for them for not being above average in their neighborhood. For your consideration, I show the incomes compared with the national average and with the local metro mean, for the 10 largest Asian groups (click for higher resolution):

To interpret the figure, you can see that Japanese and Indians are about 0.36 higher in log dollars than the national average but only 0.26 higher than their metro-area averages. On the downside, Hmong individuals have adjusted per capita incomes of 0.58 less than the national average, but 0.63 less than their local average.

Higher-than-average-income Japanese, Indians, Filipinos and Chinese are about 73% of the total; Koreans are about average, and the lower-than-average groups are 17% of the total. By this method, then, a big majority of Asians in the U.S. belong to above-local-average income groups, but a substantial fraction are well below average. And they are all doing worse relative to their metro area neighbors than they are to the national average.

Notice how it’s different from the Pew figure. In that, Vietnamese households had higher incomes than Koreans, and both were above the national average. Here Koreans are doing substantially better, mostly as a result of the household size adjustments. Also, the smaller groups I show – the ones Pew did not detail in that figure – are the poorer ones. And they are also doing worse locally relative to their national position.

Finally, consider the inequality within groups. Without doing a full-blown analysis of this, I can show the importance of the question with a simple box-and-whisker plot. This shows the distribution of income — adjusted as described above for household composition and size — for each group, including non-Asians for comparison.

The graph shows a lot of information in a small space:

• The line through the middle of each box is the median, or mid point, of each income distribution.
• The blue + sign is the mean. The further the mean is above the median, the more rich people there are pulling the mean up.
• The top and bottom of the boxes are the 75th and 25th percentiles. The further apart they are, the greater the income gap between top and bottom.

(The top whiskers, which can be used to show the highest point in each distribution, aren’t shown here, because they’re so far away it would make the graph unreadable.)

As I mentioned at the top, the graph shows that Pakistanis and Chinese, and to a lesser extent Koreans and Indians, have high levels of inequality — their + signs are far from their median lines, and their 75/25 spreads are large. On the other hand, Filipinos, Laotians and Hmong have much narrower spreads.

Practically speaking, all this means that some groups are misrepresented by measures of the overall status of “Asians,” especially the smaller, poorer groups. And further, that generalizing will represent some groups worse than others because of their internal diversity. For example, the average Chinese American is quite a bit richer than the average non-Asian American, but the poorest 25% of Chinese are not much better off than the poorest 25% of the population at large.

Like I said, just an idea, with a few examples.

Take it away

Feel free to do it more, and/or better, yourself. Here’s my SAS code. Please credit me if it works, but don’t blame me if it’s wrong. This has not been peer-reviewed – it’s rough work product. Send any corrections written on the back of a \$20-bill. (Everyone else: You can stop reading now!)

Just get these variables from IPUMS:

```SERIAL
METAREA
HHINCOME
PERWT
AGE
RACED```

And then do this to them:

```/* exclude households with no income */
if hhincome>0;
/* this codes folks into this scheme, with Asians from richest to poorest:
0="Not Asian"
1= "Japanese"
2= "Indian"
3= "Filipino"
4= "Chinese"
5= "Korean"
6= "Vietnam"
7= "Pakistani"
8= "Laotian"
9= "Cambodian"
10= "Hmong"
11= "OtherA"
12= "twoplusA"
*/
/* these codes refer to RACED, the detailed race variable on IPUMS */
/* Count asians as those who are asian alone, multiple asian, asian and white, asian and PI, or white-asian-PI */
asian=0;
if raced in (400 410 420 811 861 911) then asian=4;
if raced in (610 814) then asian=2;
if raced in (600 813 864 865 914) then asian=3;
if raced in (640 816) then asian=6;
if raced in (620 815) then asian=5;
if raced in (500 812) then asian=1;
if raced in (660) then asian=9;
if raced in (661) then asian=10;
if raced in (662) then asian=8;
if raced in (669) then asian=7;
if raced in ( 663 664 665 666 667 668 670 671 672 810 817 818 860 867 868 910 915) then asian = 11;
if raced in ( 673 674 675 676 677 678 679 819 869) then asian = 12;
/* so the variable labels display in output */
format
METAREA METAREA_f.
ASIAN asian.
;
/* add the decimal to the weight variable */
format PERWT 11.2;
run;
/* this counts up the number of kids and adults in each household */
proc sort data=temp; by serial; run;
data hh;
set temp (keep=serial age);
by serial;
if first.serial then do;
kids=0;
adults=0;
end;
retain kids adults;
if age le 18 then do; kids=kids+1; end;
if age gt 18 then do; adults=adults+1; end;
keep serial kids adults;
if last.serial;
run;
proc sort data=hh; by serial; run;
/* this merges in those people counts, and then calculates the household income variable */
data people;
merge temp hh; by serial;
equiv = hhincome/((adults+(.7*kids))**.65);
lnequiv = log(hhincome/((adults+(.7*kids))**.65));
run;
/* this outputs the mean logged household equivalent income for each metro area (with non-metro folks as 0 */
proc means noprint data=people;
var lnequiv;
class metarea;
weight perwt;
output out=msa mean=msaequiv;
run;
proc sort data=msa; by metarea; run;
proc sort data=people; by metarea; run;
/* this merges in the metro area variable and calculates the income-difference variable */
data merged;
merge people (in=a) msa;
by metarea;
if a;
relhhinc = lnequiv-msaequiv;
run;
/* Distribution of the logged income variable */
proc univariate data=merged; var lnequiv; run;
proc univariate data=merged; var lnequiv; class asian; run;
/* Boxplots */
proc sort data=merged; by asian; run;
title 'Income distributions, household composition- and scale-adjusted';
proc boxplot data=merged;
plot equiv*asian / clipfactor = 1.5 grid;
where asian le 10;
run;
title;
/* National income means */
proc means mean data=merged;
var lnequiv;
weight perwt;
run;
/* National asian income means by group */
proc means mean missing data=merged;
var lnequiv; class asian; weight perwt;
run;
/* Relative income for each Asian group, for metro people only */
proc means mean;
var relhhinc; class asian; weight perwt;
where asian >0 and metarea>0;
run;```

5 Comments

Filed under Me @ work, Research reports

That giant gobbling sound (is the 1% eating more and more of the cookies)

The Congressional Budget Office has a new report on trends in the income distribution. The big news is the 1%’s blitzkrieg assault on equality.

But it’s not just another rehash of Census numbers. Two adjustments they made seem especially good. First, they used a tricky matching method to combine Current Population Survey numbers (which do better at benefits and low-income households) combined with Internal Revenue Service data (which is better for high-end data). Second, they adjusted for household size and composition, and calculated distributions before and after taxes and transfers, and among different kinds of income.

The headline is the changing share of after-tax-and-transfer household income. Every group except the top 1% had a smaller share of income in 2007 than they did in 1979, or just an equal share in the case of the 81st-99th percentile group. That means the top quintile’s whole gain came in the top 1%.

That is very important. A source of outrage for the hundreds of thousands of Facebook users posting, commenting, or Liking Occupy Wall St. and its related pages.

It would be misleading, however, to view the chart as showing that incomes fell for the other groups. Income growth has been very skewed toward the top, but it is by no means confined to the top 1%. Here is my graph showing the income cutoffs for each quintile, and for the top slices separately. These are the bottom cutoffs in 1979 and 2007 (in inflation-adjusted dollars), with the percentage change in the backgrounded bars.

(Note there is no cutoff for the bottom quintile — the price of entry for that group is always \$0).

Two thoughts about this.

1. Even if there were no 1%, if the graph only included the green bars, there would be plenty of increasing inequality for what might then be called “the 80%” to protest. The 81st-99th folks may be lucky to have the popular anger directed at the grotesque opulence of the sliver above them. (I’m not diminishing the 1%’s income gains, but as Matt Taibbi pointed out yesterday, the object of opposition is not just their income, but their influence.)

2. If you look at the families and networks of the top 1%, how many of them have relatives, friends, and even co-“workers” who are only in the top 10%? Would a self-respecting 1% family be appalled if their son married someone from a stable 5%-er family?

What I’m wondering is whether the 1% folks are merely a statistical convenience rather than a socially cohesive group (class?). That’s an empirical question that national income distributions can’t necessarily answer.

The CBO report is here, a summary is here, and the blog post version is here.

4 Comments

Filed under In the news

Little income distribution graph

From the department of unhelpful statistics today I read this:

“Recent estimates indicate that at the current rate it will take more than 800 years for the bottom billion of the world population to achieve 10% of global income.”

Seems like a shockingly slow rate of progress, since anything that takes 800 years is basically not happening. But the problem is with the juxtaposition of a big number (billion) with a small fraction (10%). A billion people isn’t that big a fraction of the population anymore. Actually, if we could ever get to that level of world inequality it would be great.

Since the bottom billion of the world is about 14% of the 7 billion people in the world, getting them 10% of the global income would be a very low level of inequality — they’d only be 4% away from a perfectly egalitarian world. In the United States now, for example, the bottom 14% of families only get about 3% of the income.

Incidentally, here’s that family distribution:

Leave a comment

Filed under In the news