Tag Archives: socimages

Pregnancy discrimination and the gender gap, involuntary job choice edition

From Rachel Swarns at the New York Times comes the story of a woman, Angelica Valencia, fired from her $8.70-an-hour produce packing job because her doctor said she couldn’t work overtime because she was three months into a risky pregnancy. There actually is a new law on her side, but her employer somehow didn’t get around to notifying her of her right to reasonable accommodation.

Before reading my comment on this, why not check out this new video from the chapter on gender in my book. The video accompanies a much more compelling version of this graphic, showing the gender composition of some occupations, calculated from the American Community Survey:

figures 4-6.xlsx

Count that gender gap

OK, Back to Angelica Valencia. I’m not an expert on pregnancy discrimination, but I want to use this to comment on how we look at the gender gap in pay. The Census Bureau reports on the gender gap this way:

In 2013, the median earnings of women who worked full time, year-round ($39,157) was 78 percent of that for men working full time, year-round ($50,033).

Critics complain that this doesn’t account for occupational choice, time out of the labor force, and so on. As Ruth Davis Konigsberg sneeringly put it in Time:

Women don’t make 77 cents to a man’s dollar. They make more like 93 cents, as long as they don’t major in art history.

And Hanna Rosin helpfully explained:

Women congregate in different professions than men do, and the largely male professions tend to be higher-paying.

So what does the story of Angelica Valencia pregnancy tell us (besides the pitfalls of majoring in art history)? Valencia may end up winning some back pay in a lawsuit. But let’s assume someone just like her didn’t, and ended up instead in a lower-paying job that doesn’t like overtime, such as at McDonald’s. If we insist on statistically controlling for occupation, hours, job tenure, and time out of the labor force in order to see the real wage gap, people like Valencia may not show up as underpaid women — if they’re paid the same as men in the same jobs, holding constant hours, job tenure, and time out of the labor force. So the very thing that makes Valencia earn less — being fired for getting pregnant — disappears from the wage gap analysis. Instead, the data shows that women take more time off work, work fewer hours, change jobs more often, and “choose” less lucrative occupation.

Sure, a lot of women chose to get pregnant (and a lot of men choose to become fathers). But getting fired and ending up in a lower paid job as a result is not part of that choice (and it doesn’t happen to fathers). The overall difference in pay between men and women, which reflects a complicated mix of factors, is a good indicator of inequality.

For background on the motherhood penalty in wage, you might start here or here (including the sources citing these).

1 Comment

Filed under In the news, Me @ work

The number one cause of traffic fatalities

Please don’t text while driving.

Note: I have updated this post to reflect a response I received from Matt Richtel.

A data illustration follows the rant.

I don’t yet have a copy of Matt Richtel’s new book, A Deadly Wandering: A Tale of Tragedy and Redemption in the Age of Attention. Based on his Pulitzer-prize winning reporting for the New York Times, however, I’m afraid it’s unlikely to do justice to the complexity of the relationship between mobile phones and motor vehicle accidents. Worse, I fear it distracts attention from the most important cause of traffic fatalities: driving.

A bad sign

The other day Richtel tweeted a link to this old news article that claims texting causes more fatal accidents for teens than alcohol. The article says some researcher estimates “more than 3,000 annual teen deaths from texting,” but there is no reference to a study or any source for the data used to make the estimate. As I previously noted, that’s not plausible.

In fact, only 2,823 teens teens died in motor vehicle accidents in 2012 (only 2,228 of whom were vehicle occupants). So, I get 7.7 teens per day dying in motor vehicle accidents, regardless of the cause. I’m no Pulitzer-prize winning New York Times journalist, but I reckon that makes this giant factoid on Richtel’s website wrong, which doesn’t bode well for the book:

richtelpage

In fact, I suspect the 11-per-day meme comes from Mother Jones (or someone they got it from) doing the math wrong on that Newsday number of 3,000 per year and calling it “nearly a dozen” (3,000 is 8.2 per day). And if you Google around looking for this 11-per day statistic, you find sites like textinganddrivingsafety.com, which, like Richtel does in his website video, attributes the statistic to the “Institute for Highway Safety.” I think they mean the Insurance Institute for Highway Safety, which is the source I used for the 2,823 number above. (The fact that he gets the name wrong suggests he got the statistic second-hand.) IIHS has an extensive page of facts on distracted driving, which doesn’t have any fact like this (they actually express skepticism about inflated claims of cellphone effects).

After I contacted him to complain about that 11-teens-per-day statistic, Richtel pointed out that the page I linked to is run by his publisher, not him, and that he had asked them to “deal with that stat.” I now see that the page includes a footnote that says, “Statistic taken from the Insurance Institute for Highway Safety’s Fatality Facts.” I don’t think that’s true, however, since the “Fatality Facts” page for teenagers still shows 2,228 teens (passengers and drivers) killed in 2012. Richtel added in his email to me:

As I’ve written in previous writings, the cell phone industry also takes your position that fatality rates have fallen. It’s a fair question. Many safety advocates point to air bags, anti-lock brakes and wider roads — billions spent on safety — driving down accident rates (although accidents per miles driven is more complex). These advocates say that accidents would’ve fallen far faster without mobile phones and texting. And they point out that rates have fallen far faster in other countries (deaths per 100,000 drivers) that have tougher laws. In fact, the U.S. rates, they say, have fallen less far than most other countries. Thank you for your thoughtful commentary on this. I think it’s a worthy issue for conversation.

I appreciate his response. Now I’ll read the book before complaining about him any more.

The shocking truth

I generally oppose scare-mongering manipulations of data that take advantage of common ignorance. The people selling mobile-phone panic don’t dwell on the fact that the roads are getting safer and safer, and just let you go on assuming they’re getting more and more dangerous. I reviewed all that here, showing the increase in mobile phone subscriptions relative to the decline in traffic accidents, injuries, and deaths.

That doesn’t mean texting and driving isn’t dangerous. I’m sure it is. Cell phone bans may be a good idea, although the evidence that they save lives is mixed. But the overall situation is surely more complicated than TEXTING-WHILE-DRIVING EPIDEMIC suggests. The whole story doesn’t seem right — how can phones be so dangerous, and growing more and more pervasive, while accidents and injuries fall? At the very least, a powerful part of the explanation is being left out. (I wonder if phones displace other distractions, like eating and putting on makeup; or if some people drive more cautiously while they’re using their phones, to compensate for their distraction; or if distracted phone users were simply the worst drivers already.)

Beyond the general complaint about misleading people and abusing our ignorance, however, the texting scare distracts us (I know, it’s ironic) from the giant problem staring us in the face: our addiction to private vehicles itself costs thousands of lives a year (not including the environmental effects).

To illustrate this, I went through all the trouble of getting data on mobile phone subscriptions by state, to compare with state traffic fatality rates, only to find this: nothing:

cellphones traffic deaths with NEJM.xlsx

What does predict deaths? Driving. This isn’t a joke. Sometimes the obvious answer is obvious because it’s the answer:

cellphones traffic deaths with NEJM.xlsx

If you’re interested, I also put both of these variables in a regression, along with age and sex composition of the states, and the percentage of employed people who drive to work. Only the miles and drive-to-work rates were correlated with vehicle deaths. Mobile phone subscriptions had no effect at all.

Also, pickups?

Failing to find a demographic predictor that accounts for any of the variation after that explained by miles driven, I tried one more thing. I calculated each state’s deviation from the line predicted by miles driven (for example Alaska, where they only drive 6.3 thousand miles per person, is predicted to have 4.5 deaths per 100,000 but they actually have 8.1, putting that state 3.6 points above the line). Taking those numbers and pouring them into the Google correlate tool, I asked what people in those states with higher-than-expected death rates are searching for. And the leading answer is large, American pickup trucks. Among the 100 searches most correlated with this variable, 10 were about Chevy, Dodge, or Ford pickup trucks, like “2008 chevy colorado” (r = .68), shown here:

deaths-searches

I could think of several reasons why places where people are into pickup trucks have more than their predicted share of fatal accidents.

So, to sum up: texting while driving is dangerous and getting more common as driving is getting safer, but driving still kills thousands of Americans every year, making it the umbrella social problem under which texting may be one contributing factor.

I used this analogy before, and the parallel isn’t perfect, but the texting panic reminds me of the 1970s “Crying Indian” ad I used to see when I was watching Saturday morning cartoons. The ad famously pivoted from industrial pollution to littering in the climactic final seconds:

Conclusion: Keep your eye on the ball.

15 Comments

Filed under In the news

New York City police killings: 1964 (life) – 1989 (art) – 2014 (life)

In July 1964, just after the passage of the Civil Rights Act, White New York City police officer Thomas Gilligan killed Black 15-year-old James Powell. After two days of peaceful protest, police and protesters clashed and six nights of violence followed. This is not James Powell being killed, just another guy being beaten:

3c36894r

In the summer of 1989, Spike Lee’s movie Do the Right Thing featured the killing of Radio Raheem by White police — using the already-infamous chokehold — after they swept into the sweltering neighborhood, where a fight had broken out. The climactic incident sparked an explosive riot (watch the scene on Hulu with membership):

deathofradioraheem

Now, another quarter century later, police on Staten Island have apparently choked 43-year-old Eric Garner to death after he refused to cooperate with whatever random demand they had, as captured on video (and posted by the Daily News):

choke18n-12-web

Now the chokehold is against police department rules, but the number of chokehold complaints — a statistic the department keeps — has been rising and last year reached 233, only a “tiny fraction” of which are substantiated. In the Daily News video, Garner is heard saying, “I can’t breathe” many times.

UPDATE: Spike Lee has now produced a video splicing together the chokehold scenes of Eric Garner and Radio Raheem. It’s embedded on Indiwire here.

3 Comments

Filed under In the news

Global inequality, within and between countries

Most of the talk about income inequality is about inequality within countries – between rich and poor Americans, versus between rich and poor Swedes, for example. The new special issue of Science magazine about inequality focuses that way as well, for example with this nice figure showing inequality within countries around the world.

But what if there were no income inequality within countries? If everyone within each country had the same income, but we still had rich and poor countries, how unequal would our world be? It turns out that’s an easy question to answer.

Using data from the World Bank on income for 131 countries, comprising 91% of the world population, here is the Lorenz curve showing the distribution of gross national income (GNI) by population, with each person in each country assumed to have the same income (using the purchasing power parity currency conversion). I’ve marked the place of the three largest countries: China, India, and the USA:

lorenza1

The Gini index value for this distribution is .48, which means the area between the Lorenz curve and the blue line – representing equality, is 48% of the lower-right triangle. (Going all the way to 1.0 would mean one person had all the money.)

But there is inequality within countries. In that Science figure the within-country Ginis range from .24 in Belarus to .67 in South Africa. (And that’s using after-tax household income, which assumes each person within each household has the same income. So there’s that, too.)

The World Bank data I’m using includes within-country income distributions broken into 7 quantiles: 5 quintiles (20% of the population each), with the top and bottom further broken in half. If I assume that the income is shared equally within each of these quantiles, I can take those 131 countries and turn them into 917 quantiles (just assigning each group its share of the country’s GNI). These groups range in average income from $0 (due to rounding) in the bottom 10th of Bolivia and Guyana, or $43 per person in the bottom 10th of the Democratic Rep. of Congo, up to $305,800 per person in the top 10th of Macao.

To illustrate this, here are India, China, and the USA, showing average incomes for the quantiles and the countries as a whole:

lorenza2

This shows that the average income of China’s top 10th is between the second and third quntiles of the US income distribution, and the top 10th of India has an average income comparable to the US 10-19th percentile range. Obviously, this breakdown shows a lot more inequality.

So here I add the new Lorenz curve to the first figure, counting each of those 917 quantiles as a separate group with its own income:

lorenza3

Now the Gini index has risen a neat 25%, to an even .60. Is that a big difference? Clearly, between country inequality — the red line — is vast. If every country were a household, the world would be almost as unequal as Nigeria. In this comparison, you could say you get 80% of the income inequality to show up just looking at whole countries. But of course even that obscures much more, especially at the high end, where there is no limit.

Years ago I followed the academic debate over how to measure inequality within and between countries. If I were to catch up with it again, I would start with this article, by my friends Tim Moran and Patricio Korzeniewicz. That provoked a debate over methods and theory, and they eventually published this book, which argues: “within-country analyses alone have not adequately illuminated our understanding of global stratification.” There is a lot more to read, but their work, and the critiques they’re received, is a good place to start.

Note: I have put my Excel worksheet for this post here. It has the original data and my calculations, but not the figures.

4 Comments

Filed under Uncategorized

Ridiculous NY Times Magazine data graphics

A series of ridiculous data graphics posts from the NY Time Magazine, collected in one post (with crummy photo-pic renderings).

These are examples of the abuse of data graphic techniques to spread ignorance, distract people from anything of actual importance, and contribute to the perception that statistics – especially graphic statistics – are just an arbitrary way of manipulating people rather than a set of tools for exploring data and attempting to answer real questions. (If you are already convinced of this and just want to see awesome real graphics, I would start with Healy and Moody’s Annual Review of Sociology paper.)

First, an innocent graphic that merely wastes space and contributes nothing — it really communicates less than the 8 simple data points it has because the bats all over are just confusing and the points are in no order (who even notices that the number of segments each bat is cut into is the data point?):

nyt-bats

 

Maybe a little better, I suppose, is this one, where the number of trees shown at least corresponds to the data points. But you would still learn more, faster, from a simple list:

nyt-trees

Here is an interesting mistake. I first thought these bars were out of order, but it turns out it’s just the top part of the bars that are out of order. If they were flat-topped bars it would be okay:

nyt-bars

Here’s one that combines useless graphics with data that is itself completely misleading. These are the fees associated with different parks in NY City. But the units of time are different. What is the point of comparing the annual tennis fee to the hourly roller hockey fee? At least they didn’t make the cards different sizes to show this meaningless comparison more clearly.

nyt-parkfees

The magazine also does text “analytics.” These are on the letters page, and they show the type of letters received. This is interesting to sociologists, who sometimes try to find ways to categorize text. They make two errors here that render these meaningless or worse.

First, they sometimes present them in order – as represented by graphic elements – when the sentiments expressed are not in that logical order. Like this one, in which the dial and shading implies these are in some logical order, but they aren’t:

nyt-four3They also did that here, with the shading implying some continuum that is not present. (In this one, also, is it the proportion of the state’s area the determines the size of the cuts, or the angle of the cuts at the center?). Come on!

nyt-four2A final point holds for all these letter “analytics.” You really shouldn’t determine the number of categories you are going to use before you read the texts, “Here, go break these letters into four categories.” For the love of God, they don’t even have an “other” category, and always ways add to 100%.

nyt-four1

 

 

 

2 Comments

Filed under In the news

How well do teen test scores predict adult income?

Now with new figures and notes added at the end — and a new, real life headline and graph illustrating the problem in the middle!

The short answer is, pretty well. But that’s not really the point.

In a previous post I complained about various ways of collapsing data before plotting it. Although this is useful at times, and inevitable to varying degrees, the main danger is the risk of inflating how strong an effect seems. So that’s the point about teen test scores and adult income.

If someone told you that the test scores people get in their late teens were highly correlated with their incomes later in life, you probably wouldn’t be surprised. If I said the correlation was .35, on a scale of 0 to 1, that would seem like a strong relationship. And it is. That’s what I got using the National Longitudinal Survey of Youth. I compared the Armed Forces Qualifying Test scores, taken in 1999, when the respondents were ages 15-19 with their household income in 2011, when they were 27-31.*

Here is the linear fit between between these two measures, with the 95% confidence interval shaded, showing just how confident we can be in this incredibly strong relationship:

afqt-linear

That’s definitely enough for a screaming headline, “How your kids’ test scores tell you whether they will be rich or poor.”

In fact, since I originally wrote this, the Washington Post Wonkblog published a post with the headline, “Here’s how much your high school grades predict your future salary,” with this incredibly tidy graph:

earnings-gpa

No doubt these are strong relationships. My correlation of .35 means AFQT explains 12% of the variation in household income. But take heart, ye parents in the age of uncertainty: 12% of the variation leaves a lot left over. This variable can’t account for how creative your children are, how sociable, how attractive, how driven, how entitled, how connected, or how White they may be. To get a sense of all the other things that matter, here is the same data, with the same regression line, but now with all 5,248 individual points plotted as well (which means we have to rescale the y-axis):

afqt-scatter

Each dot is a person’s life — or two aspects of it, anyway — with the virtually infinite sources of variability that make up the wonder of social existence. All of a sudden that strong relationship doesn’t feel like something you can bank on with any given individual. Yes, there are very few people from the bottom of the test-score distribution who are now in the richest households (those clipped by the survey’s topcode and pegged at 3 on my scale), and hardly anyone from the top of the test-score distribution who is now completely broke.

But I would guess that for most kids a better predictor of future income would be spending an hour interviewing their parents and high school teachers, or spending a day getting to know them as a teenager. But that’s just a guess (and that’s an inefficient way to capture large-scale patterns).

I’m not here to argue about how much various measures matter for future income, or whether there is such a thing as general intelligence, or how heritable it is (my opinion is that a test such as this, at this age, measures what people have learned much more than a disposition toward learning inherent at birth). I just want to give a visual example of how even a very strong relationship in social science usually represents a very messy reality.

Post-publication addendums

1. Prediction intervals

I probably first wrote about this difference between the slope and the variation around the slope two years ago, in a futile argument against the use of second-person headlines such as “Homophobic? Maybe You’re Gay.” Those headlines always try to turn research into personal advice, and are almost always wrong.

Carter Butts, in personal correspondence, offered an explanation that helps make this clear. The “you” type headline presents a situation in which you — the reader — are offered the chance to add yourself to the study. In that case, your outcome (the “new response” in his note) is determined by the both the line and the variation around the line. Carter writes:

the prediction interval for a new response has to take into account not only the (predicted) expectation, but also the (predicted) variation around that expectation. A typical example is attached; I generated simulated data (N=1000) via the indicated formula, and then just regressed y on x. As you’d expect, the confidence bands (red) are quite narrow, but the prediction bands (green) are large – in the true model, they would have a total width of approximately 1, and the estimated model is quite close to that. Your post nicely illustrated that the precision with which we can estimate a mean effect is not equivalent to the variation accounted for by that mean effect; a complementary observation is that the precision with which we can estimate a mean effect is not equivalent to the accuracy with which we can predict a new observation. Nothing deep about that … just the practical points that (1) when people are looking at an interval, they need to be wary of whether it is a confidence interval or a prediction interval; and (2) prediction interval can (and often should be) wide, even if the model is “good” in the sense of being well-estimated.

And here is his figure. “You” are very likely to be between the green lines, but not so likely to be between the red ones.

CarterButtsPredictionInterval

2. Random other variables

I didn’t get into the substantive issues, which are outside my expertise. However, one suggestion I got was interesting: What about happiness? Without endorsing the concept of “life satisfaction” as measured by a single question, I still think this is a nice addition because it underscores the point of wide variation in how this relationship between test scores and income might be experienced.

So here is the same figure, but with the individuals coded according to how they answered the following question in 2008, when they were age 24-28, “All things considered, how satisfied are you with your life as a whole these days? Please give me an answer from 1 to 10, where 1 means extremely dissatisfied and 10 means extremely satisfied.” In the figure, Blue is least satisfied (1-6; 21%), Orange is moderately satisfied (7-8; 46%), and Green is most satisfied (9-10; 32%)

afqt-scatter-satisfied

Even if you squint you probably can’t discern the pattern. Life satisfaction is positively correlated with income at .16, and less so with test scores (.07). Again, significant correlation — not helpful for planning your life.

* I actually used something similar to AFQT: the variable ASVAB, which combines tests of mathematical knowledge, arithmetic reasoning, word knowledge, and paragraph comprehension, and scales them from 0 to 100. For household income, I used a measure of household income relative to the poverty line (adjusted for household size), plus one, and transformed by natural log. I used household income because some good test-takers might marry someone with a high income, or have fewer people in their households — good decisions if your goal is maximizing household income per person.

8 Comments

Filed under Me @ work

Does sleeping with a guy on the first date make him less likely to call back?

I have no idea. But there is a simple reason that it might seem like it does, even if it doesn’t.

Let’s imagine that a woman — we’ll call her “you,” like they do in relationship advice land — is trying to calculate the odds that a man will call back after sex. Everyone tells you that if you sleep with a guy on the first date he is less likely to call back. The theory is that giving sex away at a such a low “price” lowers the man’s opinion of you, because everyone thinks sluts are disgusting.* Also, shame on you.

Photo by Emily Hildebrand, from Flickr Creative Commons

So, you ask, does the chance he will call back improve if you wait till more dates before having sex with him? You ask around and find that this is actually true: The times you or your friends waited till the seventh date, two-thirds of the guys called back, but when you slept with him on the first date, only one-in-five called back. From the data, it sure looks like sleeping with a guy on the first date reduces the odds he’ll call back.

callback1

So, does this mean that women make men disrespect them by having sex right away? If that’s true, then the historical trend toward sex earlier in relationships could be really bad for women, and maybe feminism really is ruining society.

Like all theories, this one assumes a lot. It assumes you (women) decide when couples will have sex, because it assumes men always want to, and it assumes men’s opinion of you is based on your sexual behavior. With these assumptions in place, the data appear to confirm the theory.

But what if that those assumptions aren’t true? What if couples just have more dates when they enjoy each other’s company, and men actually just call back when they like you? If this is the case, then what really determines whether the guy calls back is how well-matched the couple is, and how the relationship is going, which also determines how many dates you have.

What was missing in the study design was relationship survival odds. Here is a closer look at the same data (not real data), with couple survival added:

callback2

By this interpretation, the decision about when to have sex is arbitrary and doesn’t affect anything. All that matters is how much the couple like and are attracted to each other, which determines how many dates they have, and whether the guy calls back. Every couple has a first date, but only a few make it to the seventh date. It appears that the first-date-sex couples usually don’t last because people don’t know each other very well on first dates and they have a high rate of failure regardless of sex. The seventh-date-sex couples, on the other hand, usually like each other more and they’re very likely to have more dates. And: there are many more first-date couples than seventh-date couples.

So the original study design was wrong. It should have compared call-back rates after first dates, not after first sex. But when you assume sex runs everything, you don’t design the study that way. And by “design the study” I mean “decide how to judge people.”

I have no idea why men call women back after dates. It is possible that when you have sex affects the curves in the figure, of course. (And I know even talking about relationships this way isn’t helping.) But even if sex doesn’t affect the curves, I would expect higher callback rates after more dates.

Anyway, if you want to go on blaming everything bad on women’s sexual behavior, you have a lot of company. I just thought I’d mention the possibility of a more benign explanation for the observed pattern that men are less likely to call back after sex if the sex takes place on the first date.

* This is not my theory.

15 Comments

Filed under In the news