Tag Archives: graphics

African American marital status by age, Du Bois replication edition

At the 1900 Paris Exposition, sociologist W. E. B. Du Bois presented some the work of his students. In The Scholar Denied: W. E. B. Du Bois and the Birth of Modern Sociology, Aldon Morris writes:

Du Bois’s meticulousness as a teacher is apparent in the charts and graphs that he prepared with his students. For example, as part of his gold medal-winning exhibit for the 1900 Paris Exposition, Du Bois and his students produced detailed hand-drawn artistically colored graphs and charts that depicted the journey of black Georgians from slavery to freedom.

Some of collection is shown in this post at the Public Domain Review (shared by Tressie McMillan Cottom yesterday); the full collection is online at the Library of Congress (LOC).

The one that caught my eye was this, showing marital status (“conjugal condition”) by age and sex for the Black population. I can’t find the source details in the LOC record, so I don’t know if it’s Georgia or national, but I presume it’s from tabulations of 1890 decennial census or earlier:


It’s artistic and meticulous and clearly informative, beautiful. So I tried to make a 2015 update to complement it. I used data from the 2015 American Community Survey via IPUMS.org, and did it a little differently.* Most importantly, I added two more conjugal conditions, cohabiting and separated/divorced. Second, I used five-year age groupings all the way up, instead of ten. Third, I detailed the age groups up to age 85. Here’s what I got:

du bois marstat replication.xlsx

Some very big differences: Much smaller proportions of African Americans married now. Also, much later marriage. In the 1900 figure more than 30% of men and 60% of women have been married by age 25; those numbers are 5-6% now. I don’t know how they counted separated/divorced people in 1900, but those numbers are high now at 31% for women and 24% for men at age 60-64. Widowhood is later now, as 42% of women were widowed before age 65 in 1900, compared with only 13% now (of course, that’s off a lower marriage rate, and remarried people are just counted as married). And of course cohabitation, which the chart doesn’t show for 1900. Note I included people in same-sex as well as different-sex couples.

So, thanks for indulging me. I hope you don’t think it’s frivolous. I just love staring at the old charts, and going through the (very different) steps of replicating it was really satisfying. (I also just love that in another 100 years someone might look back on this and say, “Wait, which one was Earth again?”)

Note: If you want to compare them side-by-side, here’s a go at that. The age ranges don’t line up perfectly but you can get the idea (click to enlarge):

* SAS code, ACS data, images, and the spreadsheet used for this post are shared as an Open Science Framework project, here.


Filed under Me @ work

Marriage and gender inequality in 124 countries

Countries with higher levels of marriage have higher levels of gender inequality. This isn’t a major discovery, but I don’t remember seeing this illustrated before, so I decided to do it. Plus I’m trying to improve my Stata graphing.

I used data from this U.N. report on marriage rates from 2008, restricted to those countries that had data from 2000 or later. To show marriage rates I used the percentage of women ages 30-34 that are currently married. This is thus a combination of marriage prevalence and marriage timing, which is something like the amount of marriage in the country. I got gender inequality from the U.N. Development Programme’s Human Development Report for 2015. The gender inequality index combines the maternal mortality ratio, the adolescent birth rate, the representation of women in the national parliament, the gender gap in secondary education, and the gender gap in labor market participation.

Here is the result. I labeled countries with 49 million population or more in red; a few interesting outliers are also labeled. The line is quadratic, unweighted for population (click to enlarge).

You can see the USA sliding right down that curve toward gender nirvana (not that I’m making a simplistic causal argument).

Note that India and China together are about 36% of the world’s population. They both have nearly universal marriage by age 30-34, but women in China get married about four years later on average. That’s an important part of why China has lower gender inequality (it goes along with more educational access, higher employment levels, politics, history, etc.). China is a major outlier among universal-marriage countries, while India is right on the curve.

Any cross-national comparison has to handle this issue. China is 139-times bigger than Sweden. One way to address it is to weight the points by their relative population sizes. If you do that it actually doesn’t change the result much, except for China, which in this cases changes everything because in addition to being huge they broke the relationship between marriage and gender inequality. Here is the comparison. Now the dots are scaled for population, and the gray line is fit to all the countries except China, while the red line includes China (click to enlarge).

My conclusion is that the gray line is the basic story — more marriage, more gender inequality — with China as an important exception, but that’s up for interpretation.

I put the data and the code for making the charts in this directory. Feel free to copy and crib, etc.


Filed under Me @ work

NYT magazine infographic: not just dumb and annoying

This graphic from the New York Times magazine is bad data presented poorly (and reproduced poorly, by my camera phone):


It’s presented poorly because those blood stains are impossible to compare since you can’t discern their edges, and it appears they don’t taper toward the edges at the same rate. Maybe they simply resized one of them to get the relative size, which would be wrong. Anyway, if they cared about communicating the data they probably would have used real data in the first place. (You could also complain that a red speckle-cloud is unfriendly to some color-blind people.)

It’s bad data because it’s an online NYT reader survey, which — although it’s from the “research and analytics” department (and no, I’m not going to add “analytics” to my Windows dictionary) — represents unknown sample selection effects on an undefined population. In other words, who cares what they think?

A survey like that would be a start if it was the only way you had to answer an important or hard-to-measure issue, and if you clearly stated that it was likely unreliable. But in this case there is good, nationally-representative data on this very question. So if NYT Magazine wanted to inform its readers of something, they could have used this.

Here’s the good data — from the General Social Survey — in a graph that is at least a lot better: this is good data in a chart that’s easier to read accurately, includes a breakout by strength of opinion, and uses more accessible colors (click to enlarge).

gss spank 2014.xlsx

I think the NYT Magazine graphics violations are not just dumb and annoying — here’s another post all about them — I think they harm the public good. Graphics like this spread ignorance and contribute to the perception that statistics – especially graphic statistics – are just an arbitrary way of manipulating people rather than a set of tools for exploring data and attempting to answer real questions. (If you want awesome real graphics, check out Healy and Moody’s Annual Review of Sociology paper.)

P.S., I wrote more about spanking here.

Leave a comment

Filed under Uncategorized

Why I called it The Family, and what that has to do with Cosby

First, a note on language

In American English books from 1910 to 1950, about 25% of the uses of “family” were preceded by “the.” Starting about 1950, however, “the family” started falling out of fashion, finally dropping below 16% of “family” uses in the mid-2000s. This trend coincides with the modern rise of family diversity.

In her classic 1993 essay, “Good Riddance to ‘The Family’,” Judith Stacey wrote,

no positivist definition of the family, however revisionist, is viable. … the family is not an institution, but an ideological, symbolic construct that has a history and a politics.

The essay was in Journal of Marriage and the Family, published by the National Council on Family Relations. In 2001, in a change that as far as I can tell was never announced, JMF changed its name to Journal of Marriage and the Family, which some leaders of NCFR believed would make it more inclusive. It was the realization of Stacey’s argument.

I decided on the title very early in the writing of my book: The Family: Diversity, Inequality, and Social Change. I agreed with Stacey that the family is not an institution. Instead, I think it’s an institutional arena: the social space where family interactions take place. I wanted to replace the narrowing, tradition-bound term, with an expansive, open-ended concept that was big enough to capture both the legal definition and the diversity of personal definitions. I think we can study and teach the family without worrying that we’re imposing a singular definition of what that means.*

It takes the unique genius that great designers have to capture a concept like this in a simple, eye-catching image. Here is how the artists at Kiss Me I’m Polish did it:


What goes in the frame? What looks like a harmless ice-breaker project — draw your family! — is also a conceptual challenge. Is it a smiling, generic nuclear family? A family oligarchy? Or a fictional TV family providing cover for an abusive, larger-than-life father figure who lectures us about morality while concealing his own serial rape behind a bland picture frame?

Whose function?

Like any family sociologist, I have great respect for Andrew Cherlin. I have taught from his textbook, as well as The Marriage Go-Round, and I have learned a lot from his research, which I cite often. But there is one thing in Public and Private Families that always rubbed me the wrong way when I was teaching: the idea that families are defined by positive “functions.”

Here’s the text box he uses in Chapter 1 (of an older edition, but I don’t think it’s changed), to explain his concept:


I have grown more sympathetic to the need for simplifying tools in a textbook, but I still find this too one-sided. Cherlin’s public family has the “main functions” of child-rearing and care work; the private family has “main functions” of providing love, intimacy, and emotional support. Where is the abuse and exploitation function?

That’s why one of the goals that motivated me to finish the book was to see the following passage in print before lots of students. It’s now in Chapter 12: Family Violence and Abuse:

We should not think that there is a correct way that families are “supposed” to work. Yes, families are part of the system of care that enhances the lived experience and survival of most people. But we should not leap from that observation to the idea that when family members abuse each other, it means that their families are not working. … To this way of thinking, the “normal” functions of the family are positive, and harmful acts or outcomes are deviations from that normal mode.

The family is an institutional arena, and the relationships between people within that arena include all kinds of interactions, good and bad. … And while one family member may view the family as not working—a child suffering abuse at the hands of a trusted caretaker, for example—from the point of view of the abuser, the family may in fact be working quite well, regarding the family as a safe place to carry out abuse without getting caught or punished. Similarly, some kinds of abuse—such as the harsh physical punishment of children or the sexual abuse of wives—may be expected outcomes of a family system in which adults have much more power than children and men (usually) have more power than women. In such cases, what looks like abuse to the victims (or the law) may seem to the abuser like a person just doing his or her job of running the family.

Huxtable family secrets

Which brings us to Bill Cosby. After I realized how easy it was to drop photos into my digital copy of the book cover, I made a series of them to share on social media — and planning to use them in an introductory lecture — to promote this framing device for the book. On September 20th of this year I made this figure and posted it in a tweet commemorating the 30th anniversary of The Cosby Show:


Ah, September. When I was just another naïve member of the clueless-American community, using a popular TV family to promote my book, blissfully unaware of the fast-approaching marketing train wreck beautifully illustrated by this graph of internet search traffic for the term “Cosby rape”:


I was never into The Cosby Show, which ran from my senior year in high school through college graduation (not my prime sitcom years). I love lots of families, but I don’t love “the family” any more than I love “society.” Like all families, the Huxtables would have had secrets if they were real. But now we know that even in their fictional existence they did have a real secret. Like some real families, the Huxtables were a device for the family head’s abuse of power and sexuality.

So I don’t regret putting them in the picture frame. Not everything in there is good. And when it’s bad, it’s still the family.

* Of course, I’m also the crank sociologist who doesn’t like to pluralize the terms sexuality, masculinity, or identity when used as objects of study. There are lots of different identities, I reckon, and I study any number of them when I’m studying identity. So call me the new old fashioned.


Filed under Me @ work

Ridiculous NY Times Magazine data graphics

A series of ridiculous data graphics posts from the NY Time Magazine, collected in one post (with crummy photo-pic renderings).

These are examples of the abuse of data graphic techniques to spread ignorance, distract people from anything of actual importance, and contribute to the perception that statistics – especially graphic statistics – are just an arbitrary way of manipulating people rather than a set of tools for exploring data and attempting to answer real questions. (If you are already convinced of this and just want to see awesome real graphics, I would start with Healy and Moody’s Annual Review of Sociology paper.)

First, an innocent graphic that merely wastes space and contributes nothing — it really communicates less than the 8 simple data points it has because the bats all over are just confusing and the points are in no order (who even notices that the number of segments each bat is cut into is the data point?):


Maybe a little better, I suppose, is this one, where the number of trees shown at least corresponds to the data points. But you would still learn more, faster, from a simple list:


Here is an interesting mistake. I first thought these bars were out of order, but it turns out it’s just the top part of the bars that are out of order. If they were flat-topped bars it would be okay:


Here’s one that combines useless graphics with data that is itself completely misleading. These are the fees associated with different parks in NY City. But the units of time are different. What is the point of comparing the annual tennis fee to the hourly roller hockey fee? At least they didn’t make the cards different sizes to show this meaningless comparison more clearly.


The magazine also does text “analytics.” These are on the letters page, and they show the type of letters received. This is interesting to sociologists, who sometimes try to find ways to categorize text. They make two errors here that render these meaningless or worse.

First, they sometimes present them in order – as represented by graphic elements – when the sentiments expressed are not in that logical order. Like this one, in which the dial and shading implies these are in some logical order, but they aren’t:

nyt-four3They also did that here, with the shading implying some continuum that is not present. (In this one, also, is it the proportion of the state’s area the determines the size of the cuts, or the angle of the cuts at the center?). Come on!

nyt-four2A final point holds for all these letter “analytics.” You really shouldn’t determine the number of categories you are going to use before you read the texts, “Here, go break these letters into four categories.” For the love of God, they don’t even have an “other” category, and always ways add to 100%.



Filed under In the news

Change scatter plots

I never read Edward Tufte‘s book The Visual Display of Quantitative Information before. (I have a lot of practice but almost no training in visual presentation of data.)

How do you describe the change in one variable between two points in time? Here’s an example of a “slopegraph” of the kind Tufte likes (many examples here). He takes a list of 15 countries’ government receipts as percentage of GDP for 1970 and 1979, and produces this simple graph:


He likes it because all the ink is data (he’s inexplicably invested in the conservation of ink). And he likes how it’s easy to see the change for each country, as well as the two ranked lists for each time point, and those with unusual changes, such as Britain, the only country with a decline. Those are strengths, and this kind of graph is often great. An alternative is a change scatter plot. Here it is with the same data:

tuftestataIn this you can see the overall upward movement (points over the red line), and specifics such as the three countries that moved as a group from 40-50 percent range to the 50-60 percent range. It also allows a vertical reading, to make comparisons between countries that started the 1970s similarly, such as Switzerland and Greece, Italy and the US, Belgium and Canada — to see how they diverged, with Switzerland, Italy, and Belgium all moving up more during the decade.

I’ve used it in a few cases before, like this graph on changes in marriage rates across 26 countries:


I think the scatter plot approach is especially helpful when you want to see how the change differs at different points in a distribution, or when there are lots of data points.

In a figure from this paper on gender segregation among managers we used it to show how the pace of women’s advance into managerial occupations stalled in the 1990s, by overlaying changes from two time periods on the same figure:


The fact that these lines are essentially parallel is useful and clearly shown. You could make this graph as a slopegraph with three columns, showing two changes, but I don’t think you’d see the pattern as well.

Here’s one I made for something else but haven’t used yet, showing the decline of manufacturing in 50 large metro areas over three decades. In this one they’re all compared with 1980, creating vertical columns of white, gray and black dots over each MA’s 1980 starting point.


Tufte would call all that white space above the diagonal a big waste.

In the Tufte example above there aren’t many cases so you could label them all. In my marriage example you can figure out the countries based on short abbreviations because the names are familiar. And in the managerial occupations or metro areas it’s the shape of the cloud that matters, so it’s OK not to label them.

Here is an example with a lot of cases, each of which is labeled, from an op-ed by Stephanie Coontz in the New York Times, showing the change in the gender composition of occupations from 1980 to 2010. This one adds a categorical scheme that is supposed to make the types of changes more easily discernible. So those in the top gray box are female-dominated, those in the bottom gray box are male-dominated, and those in the middle are integrated. Green lines denote occupations that entered the integrated zone; red lines denote occupations that became more segregated.

30coontz-gr1-popup-v2This has a lot of information, but it doesn’t do much more for me than a table would. And the categorical color scheme hides a number of occupations that changed a lot but remained within the arbitrary categories (gray lines). By converting it to a change scatter plot, you can get a sense of the overall pattern of change, and still isolate those with big changes. In the version here I’ve only tagged the ones that changed 20 percentage points or more, so a lot of information is lost, but the graph is a lot smaller, so you could afford to add some text with additional detail.


Here you quickly see that most occupations became more female. And there is a clump of occupations that changed a lot but remained in the middle-range category — medical, education, and human resource managers, and accountants. These were grayed out in the Times version, but they integrated dramatically so you should notice them.

This might not be the best example, but I like this method of showing within-case changes over time.

1 Comment

Filed under Me @ work