Why you’ll never establish the existence of distinct “generations” in American society

An update from Pew, today’s thoughts, and then another data exercise.

Pew response

After sending it the folks in charge at the Pew Research Center, I received a very friendly email response to our open letter on generation labels. They thanked me and reported that they already had plans to begin an internal discussion about “generational research” and will be consulting with experts as they do, although the timeline was not given. I take this to mean we have a bona fide opportunity to change course on this issue, both with Pew (which has outsized influence) and more widely in the coming months. But the outcome is not assured. If you agree that the “generations” labels and surrounding discourse are causing more harm than good, for researchers and the public, I hope you will join with me and 140+ social scientists who have signed the letter so far, by signing and sharing the letter (especially to people who aren’t on Twitter). Thanks!

avocado toast

Why “generations” won’t work

Never say never, but I don’t see how it will be possible to identify coherent, identifiable, stable, collectively recognized and popularly understood “generation” categories, based on year of birth, that reliably map onto a diverse set of measurable social indicators. If I’m right about that, which is an empirical question, then whether Pew’s “generations” are correctly defined will never be resolved, because the goal is unattainable. Some other set of birth-year cutoffs might work better for one question or another, but we’re not going to find a set of fixed divisions that works across arenas — such as social attitudes, family behavior, and economic status. So we should instead work on weaning the clicking public from its dependence on the concept and get down to the business of researching social trends (including cohort patterns), and communicating about that research in ways that are intelligible and useful.

Here are some reasons why we don’t find a good set of “generation” boundaries.

1. Mass media and social media mean there are no unique collective experiences

When something “happens” to a particular cohort, lots of other people are affected, too. Adjacent people react, discuss, buy stuff, and define themselves in ways that are affected by these historical events. Gradations emerge. The lines between who is and is not affected can’t be sharply drawn by age.

2. Experiences may be unique, but they don’t map neatly onto attitudes or adjacent behaviors

Even if you can identify something that happened to a specific age group at a specific point in time, the effects of such an experience will be diffuse. To name a few prominent examples: some people grew up in the era of mass incarceration and faced higher risks of being imprisoned, some people entered the job market in 2009 and suffered long-term consequences for their career trajectories, and some people came of age with the Pill. But these experiences don’t mark those people for distinct attitudes or behaviors. Having been incarcerated, unemployed, or in control of your pregnancy may influence attitudes and behaviors, but it won’t set people categorically apart. People whose friends or parents were incarcerated are affected, too; grandparents with unemployed people sleeping on their couches are affected by recessions; people who work in daycare centers are affected by birth trends. And, of course, African Americans have a unique experience with mass incarceration, rich people can ride out recessions, and the Pill is for women. When it comes to indicators of the kind we can measure, effects of these experiences will usually be marginal, not discrete, and not universal. (Plus, as cool new research shows, most people don’t change their minds much after they reach adulthood, so any effects of life experience on attitudes are swimming upstream to be observable at scale.)

3. It’s global now, too

Local experiences don’t translate directly to local attitudes and behavior because we share culture instantly around the world. So, 9/11 happened in the US but everyone knew about it (and there was also March 11 in Spain, and 7/7 in London). There are unique things about them that some people experienced — like having schools closed if you were a kid living in New York — but also general things that affected large swaths of the world, like heightened airline security. The idea of a uniquely affected age group is implausible.

4. Reflexivity

Once word gets out (through research or other means) about a particular trait or practice associated with a “generation,” like avocado toast or student debt, it gets processed and reprocessed reflexively by people who don’t, or do, want to embody a stereotype or trend for their supposed group. This includes identifying with the group itself — some people avoid it and some people embrace it, and some people react to who does the other things in other ways — until the category falls irretrievably into a vortex of cultural pastiche. The discussion of the categories, in other words, probably undermines the categories as much as it reinforces them.

If all this is true, then insisting on using stable, labeled, “generations” just boxes people into useless fixed categories. As the open letter puts it:

Predetermined cohort categories also impede scientific discovery by artificially imposing categories used in research rather than encouraging researchers to make well justified decisions for data analysis and description. We don’t want to discourage cohort and life course thinking, we want to improve it.

Mapping social change

OK, here’s today’s data exercise. There is some technical statistical content here not described in the most friendly way, I’m sorry to say. The Stata code for what follows is here, and the GSS 1972-2018 Cross-Sectional Cumulative Data file is free, here (Stata version); help yourself.

This is just me pushing at my assumptions and supplementing my reading with some tactile data machinations to help it sink in. Following on the previous exercise, here I’ll try out an empirical method for identifying meaningful birth year groupings using attitude questions from the General Social Survey, and then see if they tell us anything, relative to “empty” categories (single years or decades) and the Pew “generations” scheme (Silent, Baby Boom, Generation X, Millennials, Generation Z).

I start with five things that are different about the cohorts of nowadays versus those of the olden days in the United States. These are things that often figure in conversations about generational change. For each of these items I use one or more questions to create a single variable with a mean of 0 and a standard deviation of 1; in each case a higher score is the more liberal or newfangled view. As we’ll see, all of these moved from lower to higher scores as you look at more recent cohorts.

  • Liberal spending: Believing “we’re spending too little money on…” seven things: welfare, the environment, health, big cities, drug addiction, education, and improving the conditions of black people. (For this scale, the measure of reliability [alpha] is .66, which is pretty good.)
  • Gender attitudes: Four questions on whether women are “suited for politics,” working mothers are bad for children, and breadwinner-homemaker roles are good. High scores mean more feminist (alpha = .70).
  • Confidence in institutions: Seven questions on organized religion, the Supreme Court, the military, major companies, Congress, the scientific community, and medicine. High scores mean less confidence (alpha = .68).
  • General political views from extremely conservative to extremely liberal (one question)
  • Never-none: People who never attend religious services and have no religious affiliation (together now up to about 16% of people).

These variables span the survey years 1977 to 2018, with respondents born from 1910 to 1999 (I dropped a few born in 2000, who were just 18 years old in 2018, and those born before 1910). Because not all questions were asked of all the respondents in every year I lost a lot of people, and I had to make some hard choices about what to include. The sample that answered all these questions is about 5,500 people (down from almost 62,000 altogether — ouch!). Still, what I do next seems to work anyway.

Clustering generations

Once I have these five items, I combine them into a megascale (alpha = .45) which I use to represent social change. You can see in the figure that successive cohorts of respondents are moving up this scale, on average. Note that these cohorts are interviewed at different points in time; for example, a 40-year-old in 1992 is in the same cohort as a 50-year-old in 2002, while the 1977 interviews cover people born all the way back to 1910. That’s how I get so many cohorts out of interviews from just 1977 to 2018 (and why the confidence intervals get bigger for recent cohorts).

The question from this figure is whether the cohort attitude trend would be well served by some strategic cutpoints to denote cohorts (“generations” not in the reproductive sense but in the sense of people born around the same time). Treating each birth year as separate is unwieldy, and the samples are small. We could just use decades of birth, or Pew’s arbitrary “generations.” Or make up new ones, which is what I’m testing out.

So I hit on a simple way to identify cutpoints using an exploratory technique known as k means clustering. This is a simple (with computers) way to identify the most logical groups of people in a dataset. In this case I used two variables: the megascale and birth year. Stata’s k means clustering algorithm then tries to find a set of groups of cases such that the differences within them (how far each case is from the means of the two variables within the group) are as small as possible. (You tell it k, the number of groups you want.) Because cohort is a continuous variable, and megascale rises over time, the algorithm happily puts people in clusters that don’t have overlapping birth years, so I get nicely ordered cohorts. I guess for a U-shaped time pattern it would put young and old people in the same groups, which would mess this up, but that’s not the case with this pattern.

I tested 5, 6, and 7 groups, thinking more or fewer than that would not be worth it. It turns out 6 groups had the best explanatory power, so I used those. Then I did five linear regressions with the megascale as the dependent variable, a handful of control variables (age, sex, race, region, and education), and different cohort indicators. My basic check of fit is the adjusted R2, or the amount of variance explained adjusted for the number of variables. Here’s how the models did, in order from worst to best:

Cohort variable(s)Adjusted R2
Pew generations.1393
One linear cohort variable.1400
My cluster categories.1423
Decades of birth.1424
Each year individually.1486

Each year is good for explaining variance, but too cumbersome, and the Pew “generations” were the worst (not surprising, since they weren’t concocted to answer this question — or any other question). My cluster categories were better than just entering birth cohort as a single continuous variable, and almost as good as plain decades of birth. My scheme is only six categories, which is more convenient than nine decades, so I prefer it in this case. Note I am not naming them, just reporting the birth-year clusters: 1910-1924, 1925-1937, 1938-1949, 1950-1960, 1961-1974, and 1975-1999. These are temporary and exploratory — if you used different variables you’d get different cohorts.

Here’s what they look like with my social change indicators:

Shown this way, you can see the different pace and timing of change for the different indicators — for example, gender attitudes changed most dramatically for cohorts born before 1950, the falling confidence in institutions was over by the end of the 1950s cohort, and the most recent cohort shows the greatest spike in religious never-nones. Social change is fascinating, complex, and uneven!

You can also see that the cuts I’m using here look nothing like Pew’s, which, for example, pool the Baby Boomers from birth years 1946-1964, and Millennials from 1980 to 1996. And they don’t fit some stereotypes you hear. For example, the group with the least confidence in major institutions is those born in the 1950s (a slice of Baby Boomers), not Millennials. Try to square these results with the ridiculousness that Chuck Todd recently offered up:

So the promise of American progress is something Millennials have heard a lot about, but they haven’t always experienced it personally. … And in turn they have lost confidence in institutions. There have been plenty of scandals that have cost trust in religious institutions, the military law enforcement, political parties, the banking system, all of it, trust eroded.

You could delve into the causes of trust erosion (I wrote a paper on confidence in science alone), but attributing a global decline in trust to a group called “Millennials,” one whose boundaries were declared arbitrarily, without empirical foundation, for a completely unrelated purpose, is uninformative at best. Worse, it promotes uncritical, determinist thinking, and — if it gets popular enough — encourages researchers to use the same meaningless categories to try to get in line with the pop culture pronouncements. You get lots of people using unscrutinized categories, compounding their errors. Social scientists have to do better, by showing how cohorts and life course events really are an important way to view and comprehend social change, rather than a shallow exercise in stereotyping.

Conclusion

The categories I came up with here, for which there is some (albeit slim) empirical justification, may or may not be useful. But it’s also clear from looking at the figures here, and the regression results, that there is no singularly apparent way to break down birth cohorts to understand these trends. In fact, a simple linear variable for year of birth does pretty well. These are sweeping social changes moving through a vast, interconnected population over a long time. Each birth cohort is riven with major disparities, along the stratifying lines of race/ethnicity, gender, and social class, as well as many others. There may be times when breaking people down into birth cohorts helps understand and explain these patterns, but I’m pretty sure we’re never going to find a single scheme that works best for different situations and trends. The best practice is probably to look at the trend in as much detail as possible, to check for obvious discontinuities, and then, if no breaks are apparent, use an “empty” category set, such as decades of birth, at least to start.

It will take a collective act of will be researchers. teachers, journalists, and others, to break our social change trend industry of its “generations” habit. If you’re a social scientist, I hope you’ll help by signing the letter. (I’m also happy to support other efforts besides this experts letter.)


Note on causes

Although I am talking about cohorts, and using regression models where cohort indicators are independent variables, I’m not assessing cohort effects in the sense of causality, but rather common experiences that might appear as patterns in the data. We often experience events through a cohort lens even if they are caused by our aging, or historical factors that affect everyone. How to distinguish such age, period, or cohort effects in social change is an ongoing subject of tricky research (see this from Morgan and Lee for a recent take using the GSS) , but it’s not required to address the Pew “generations” question: are there meaningful cohorts that experience events in a discernibly collective way, making them useful groups for social analysis.

Author meets critic: Margaret K. Nelson, Like A Family

Like Family

These are notes for my discussion of Like Family, Narratives of Fictive Kinship, by Margaret K. Nelson. Author Meets Critics session at the Eastern Sociological Society, 21 Feb 2021.

Like A Family is a fascinating, enjoyable read, full of thought-provoking analysis and a lot of rich stories, with detailed scenarios that let the reader consider lots of possibilities, even those not mentioned in the text. It’s “economical prose” that suggests lots of subtext and brings to mind a lot of different questions (some of which are in the wide-ranging footnotes).

It’s about people choosing relationships, and choosing to make them be “like” family, and how that means they are and are not “like” family, and in the process tells us a lot about how people think of families altogether, in terms of bonds and obligations and language and personal history.

In my textbook I use three definitions: the legal family, the personal family, and the family as an institutional arena. This is the personal family, which is people one considers family, on the assumption or understanding they feel the same way.

Why this matters, from a demographer perspective: Most research uses household definitions of family. That’s partly because some things we have to measure, and it’s a way to make sure we only get each person once (without a population registry or universal identification), and correctly attribute births to birth parents. But it comes at a cost – we assume household definitions of family too often.

We need formal, legal categories for things like incest laws and hospital rights, and the categories take on their own power. (Note there are young adult semi-step siblings with semi-together parents living together some of the time or not wondering about the propriety of sexual relationships with each other.) Reality doesn’t just line up with demographic / legal / bureaucratic categories – there is a dance between them. As the Census “relationship” categories proliferate – from 6 in 1960 ago to about 16 today – people both want to create new relationships (which Nelson calls a “creative” move) and make their relationships fit within acceptable categories (like same-sex marriage).

Screenshot 2021-02-22 105117

Methods and design

The categories investigated here – sibling-like relationships among adults, temporary adult-adolescent relationships, and informal adoptions – are so very different it’s hard to see what they have in common except some language. The book doesn’t give the formal selection criteria, so it’s hard to know exactly how the boundaries around the sample were drawn.

Nelson uses a very inductive process: “Having identified respondents and created a typology, I could refine both my specific and more general research questions” (p. 11). Not how I think of designing research projects, which just shows the diversity among sociologists.

Over more than one chapter, there is an extended case study of Nicole and her erstwhile guardians Joyce and Don, who she fell in with when her poorer family of origin broke up, essentially. Fascinating story.

The book focuses on white, (mostly) straight middle class people. This is somewhat frustrating. The rationale is that they are understudied. So that’s useful, but it would be more challenging – I guess a challenge for subsequent research – to more actively analyze their White straight middle classness as part of the research.

Compared to what

A lot of insights in the book come from people comparing their fictive kin relationships to their other family or friend relationships. This raises a methodological issue: These are people with active fictive kin relationships, so it’s a tricky sample from which to draw for understanding non-fictive relationships – it’s select. It would be nice in an ideal world to have a bigger sample without restriction and ask people about all their relationships and then compare fictive and non-fictive. Understandable not to have that, but needs to be wrestled with (by people doing future research).

Nelson establishes that the sibling-like relationships are neither like friendships nor like family, a third category. But that’s just for these people. Maybe people without fictive kin like this have family or friend relationships that look just like this in terms of reciprocity, obligation, closeness, etc. (Applies especially to the adult-sibling-like relationships.)

Modern contingency

Great insight with regard to adult “like-sibling” relationships: It’s not just that they are not as close as “family,” it’s that they are not “like family” in the sense of “baggage,” they don’t have that “tarnished reality” – and in that sense they are like the way family relationships are moving, more volitional and individualized and contingent.

Does this research show that family relationships generally in a post-traditional era are fluid and ambiguous and subject to negotiation and choice? It’s hard to know how to read this without comparison families. But here’s a thought. John, who co-parents a teenage child named Ricky, says, “To me family means somebody is part of your life that you are committed to. You don’t have to like everything about them, but whatever they need, you’re willing to give them, and if you need something, you’re willing to ask them, and you’re willing to accept if they can or can’t give it to you” (p. 130). It’s an ideal. Is it a widespread ideal? What if non-fictive family members don’t meet that ideal? The implication may be they aren’t your family anymore. Which could be why we are seeing so many people rupturing their family of origin relationships, especially young adults breaking up with their parents.

It reminds me of what happened with marriage half a century ago, where people set a high standard, and defined relationships that didn’t meet it as “not a marriage.” Or when people say abusive families aren’t really families. Conservatives hate this, because it means you can “just” walk away from bad relationships. There are pros and cons to this view.

Nelson writes at the end of the chapter on informal parents, “The possibility is always there that either party will, at some point in the near or distant future, make a different choice. That is both the simple delight and the heartrending anxiety of these relationships” (p. 133). We can’t know, however, how unique such feelings are to these relationships – I suspect not that much. This sounds so much like Anthony Giddens and the “pure” relationships of late modernity.

This contingency comes up a few times, and I always have the same question. Nelson writes in the conclusion, “Those relationships feel lighter, more buoyant, more simply based in deep-seated affection than do those they experience with their ‘real’ kin.” But that tells us how these people feel about real kin, not how everyone does. It raises a question for future research. Maybe outside this population lots of people feel the same way about their “real” kin (ask the growing number of parents who have been “unfriended” by their adult children).

I definitely recommend this book, to read, teach, and use to think about future research.

Note: In the discussion Nelson replied that most people have active fictive-kin relationships, so this sample is not so select in that respect.

Data analysis shows Journal Impact Factors in sociology are pretty worthless

The impact of Impact Factors

Some of this first section is lifted from my blockbuster report, Scholarly Communication in Sociology, where you can also find the references.

When a piece of scholarship is first published it’s not possible to gauge its importance immediately unless you are already familiar with its specific research field. One of the functions of journals is to alert potential readers to good new research, and the placement of articles in prestigious journals is a key indicator.

Since at least 1927, librarians have been using the number of citations to the articles in a journal as a way to decide whether to subscribe to that journal. More recently, bibliographers introduced a standard method for comparing journals, known as the journal impact factor (JIF). This requires data for three years, and is calculated as the number of citations in the third year to articles published over the two prior years, divided by the total number of articles published in those two years.

For example, in American Sociological Review there were 86 articles published in the years 2017-18, and those articles were cited 548 times in 2019 by journals indexed in Web of Science, so the JIF of ASR is 548/86 = 6.37. This allows for a comparison of impact across journals. Thus, the comparable calculation for Social Science Research is 531/271 = 1.96, and it’s clear that ASR is a more widely-cited journal. However, comparisons of journals in different fields using JIFs is less helpful. For example, the JIF for the top medical journal, New England Journal of Medicine, is currently 75, because there are many more medical journals publishing and citing more articles at higher rates, and more quickly than do sociology journals. (Or maybe NEJM is just that much more important.)

In addition to complications in making comparisons, there are problems with JIFs (besides the obvious limitation that citations are only one possible evaluation metric). They depend on what journals and articles are in the database being used. And they mostly measure short-term impact. Most important for my purposes here, however, is that they are often misused to judge the importance of articles rather than journals. That is, if you are a librarian deciding what journal to subscribe to, JIF is a useful way of knowing which journals your users might want to access. But if you are evaluating a scholar’s research, knowing that they published in a high-JIF journal does not mean that their article will turn out to be important. It is especially wrong to look at an article that’s old enough to have citations you could count (or not) and judge its quality by the journal it’s published in — but people do that all the time.

To illustrate this, I gathered citation data from the almost 2,500 articles published in 2016-2019 in 15 sociology journals from the Web of Science category list.* In JIF these rank from #2 (American Sociological Review, 6.37) to #46 (Social Forces, 1.95). I chose these to represent a range of impact factors, and because they are either generalist journals (e.g., ASR, Sociological Science, Social Forces) or sociology-focused enough that almost any article they publish could have been published in a generalist journal as well. Here is a figure showing the distribution of citations to those articles as of December 2020, by journal, ordered from higher to lower JIF.

After ASR, Sociology of Education, and American Journal of Sociology, it’s hard to see much of a slope here. Outliers might be playing a big role (for example that very popular article in Sociology of Religion, “Make America Christian Again: Christian Nationalism and Voting for Donald Trump in the 2016 Presidential Election,” by Whitehead, Perry, and Baker in 2018). But there’s a more subtle problem, which is the timing of the measures. My collection of articles is 2016-2019. The JIFs I’m using are from 2019, based on citations to 2017-2018 articles. These journals bounce around; for example, Sociology of Religion jumped from 1.6 to 2.6 in 2019. (I address that issue in the supplemental analysis below.) So what is a lazy promotion and tenure committee, which is probably working off a mental reputation map at least a dozen years old, to do?

You can already tell where I’m going with this: In these sociology journals, there is so much noise in citation rates within the journals, compared to any stable difference between them, that outside the very top the journal ranking won’t much help you predict how much a given paper will be cited. If you assume a paper published in AJS will be more important than one published in Social Forces, you might be right, but if the odds that you’re wrong are too high, you just shouldn’t assume anything. Let’s look closer.

Sociology failure rates

I recently read this cool paper (also paywalled in the Journal of Informetrics) that estimates the odds of this “failure probability,” the odds that your guess about which paper will be more impactful based on the journal title turns out to be wrong. When JIFs are similar, the odds of an error are very high, like a coin flip. “In two journals whose JIFs are ten-fold different, the failure probability is low,” Brito and Rodríguez-Navarro conclude. “However, in most cases when two papers are compared, the JIFs of the journals are not so different. Then, the failure probability can be close to 0.5, which is equivalent to evaluating by coin flipping.”

Their formulas look pretty complicated to me, so for my sociology approach I just did it by brute force (or if you need tenure you could call it a Monte Carlo approach). I randomly sampled 100,000 times from each possible pair of journals, then calculated the percentage of times the article with more citations was from a journal with a higher impact factor. For example, in 100,000 comparisons of random pairs sampled from ASR and Social Forces (the two journals with the biggest JIF spread), 73% of the time the ASR article had more citations.

Is 73% a lot? It’s better than a coin toss, but I’d hate to have a promotion or hiring decision be influenced by an instrument that blunt. Here are results of the 10.5 million comparisons I made (I love computers). Click to enlarge:

Outside of the ASR column, these are very bad; in the ASR column they’re pretty bad. For example, a random article from AJS only has more citations than one from the 12 lower-JIF journals 59% of the time. So if you’re reading CVs, and you see one candidate with a two-year old AJS article and one with a two-year-old Work & Occupations article, what are you supposed to do? You could compare the actual citations the two articles have gotten, or you could assess their quality of impact some other way. You absolutely should not just skim the CV and assume the AJS article is or will be more influential based on the journal title alone; the failure probability of that assumption is too high.

On my table you can also see some anomalies, of the kind which plague this system. See all that brown in the BJS and Sociology of Religion columns? That’s because both of those journals had sudden increases in their JIF, so their more recent articles have more citations, and most of the comparisons in this table (like in your memory, probably) are based on data from a few years before that. People who published in these journals three years ago are today getting an undeserved JIF bounce from having these titles on their CVs. (See the supplemental analysis below for more on this.)

Conclusion

Using JIF to decide which papers in different sociology journals are likely to be more impactful is a bad idea. Of course, lots of people know JIF is imperfect, but they can’t help themselves when evaluating CVs for hiring or promotion. And when you show them evidence like this, they might say “but what is the alternative?” But as Brito & Rodríguez-Navarro write: “if something were wrong, misleading, and inequitable the lack of an alternative is not a cause for continuing using it.” These error rates are unacceptably high.

In sociology most people won’t own up to relying on impact factors, but most people (in my experience) do judge research by where it’s published all the time. If there is a very big difference in status — enough to be associated with an appreciably different acceptance rate, for example — that’s not always wrong. But it’s a bad default.

In 2015 the biologist Michael Eisen suggested that tenured faculty should remove the journal titles from their CVs and websites, and just give readers the title of the paper and a link to it. He’s done it for his lab’s website, and I urge you to look at it just to experience the weightlessness of an academic space where for a moment overt prestige and status markers aren’t telling you what to think. I don’t know how many people have taken him up on it. I did it for my website, with the explanation, “I’ve left the titles off the journals here, to prevent biasing your evaluation of the work before you read it.” Whatever status I’ve lost I’ve made up for in virtue-signaling self-satisfaction — try it! (You can still get the titles from my CV, because I feel like that’s part of the record somehow.)

Finally, I hope sociologists will become more sociological in their evaluation of research — and of the systems that disseminate, categorize, rank, and profit from it.

Supplemental analysis

The analysis thus far is, in my view, a damning indictment of real-world reliance on the Journal Impact Factor for judging articles, and thus the researchers who produce them. However, it conflates two problems with the JIF. First is the statistical problem of imputing status from an aggregate to an individual, when the aggregate measure fails to capture variation that is very wide relative to the difference between groups. Second, more specific to JIF, is the reliance on a very time-specific comparison: citations in year three to publications in years one and two. Someone could do (maybe already has) an analysis to determine the best lag structure for JIF to maximize its predictive power, but the conclusions from the first problem imply that’s a fool’s errand.

Anyway, in my sample the second problem is clearly relevant. My analysis relies strictly on the rank-ordering provided by the JIF to determine whether article comparisons succeed or fail. However, the sample I drew covers four years, 2016-2019, and counts citations to all of them through 2020. This difference in time window produces a rank ordering that differs substantially (the rank order correlation is .73), as you can see:

In particular, three journals (BJS, SOR, and SFO) moved more than five spots in the ranking. A glance at the results table above shows that these journals are dragging down the matching success rate. To pull these two problems apart, I repeated the analysis using the ranking produced within the sample itself.

The results are now much more straightforward. First, here is the same box plot but with the new ordering. Now you can see the ranking more clearly, though you still have to squint a little.

And in the match rate analysis, the result is now driven by differences in means and variances rather than by the mismatch between JIF and sample-mean rankings (click to enlarge):

This makes a more logical pattern. The most differentiated journal, ASR, has the highest success rate, and the journals closest together in the ranking fail the most. However, please don’t take from this that such a ranking becomes a legitimate way to judge articles. The overall average on this table is still only 58%, up only 4 points from the original table. Even with a ranking that more closely conforms to the sample, this confirms Brito and Rodríguez-Navarro’s conclusion: “[when rankings] of the journals are not so different … the failure probability can be close to 0.5, which is equivalent to evaluating by coin flipping.”

These match numbers are too low to responsibly use in such a way. These major sociology journals have citation rates that are too variable, and too similar at the mean, to be useful as a way to judge articles. ASR stands apart, but only because of the rest of the field. Even judging an ASR paper against its lower-ranked competitors produces a successful one-to-one ranking of papers just 72% of the time — and that only rises to 82% with the least-cited journal on the list.

The supplemental analysis is helpful for differentiating the multiple problems with JIF, but it does nothing to solve the problem of using journal citation rates to evaluate individual articles.


*The data and Stata code I used is up here: osf.io/zutws. This includes the lists of all articles in the 15 journals from 2016 to 2020 and their citation counts as of the other day (I excluded 2020 papers from the analysis, but they’re in the lists). I forgot to save the version of the 100k-case random file that I used to do this, so I guess that can never be perfectly replicated; but you can probably do it better anyway.

COVID-19 mortality rates by race/ethnicity and age

Why are there such great disparities in COVID-19 deaths across race/ethnic groups in the U.S.? Here’s a recent review from New York City:

The racial/ethnic disparities in COVID-related mortality may be explained by increased risk of disease because of difficulty engaging in social distancing because of crowding and occupation, and increased disease severity because of reduced access to health care, delay in seeking care, or receipt of care in low-resourced settings. Another explanation may be the higher rates of hypertension, diabetes, obesity, and chronic kidney disease among Black and Hispanic populations, all of which worsen outcomes. The role of comorbidity in explaining racial/ethnic disparities in hospitalization and mortality has been investigated in only 1 study, which did not include Hispanic patients. Although poverty, low educational attainment, and residence in areas with high densities of Black and Hispanic populations are associated with higher hospitalizations and COVID-19–related deaths in NYC, the effect of neighborhood socioeconomic status on likelihood of hospitalization, severity of illness, and death is unknown. COVID-19–related outcomes in Asian patients have also been incompletely explored.

The analysis, interestingly, found that Black and Hispanic patients in New York City, once hospitalized, were less likely to die than White patients were. Lots of complicated issues here, but some combination of exposure through conditions of work, transportation, and residence; existing health conditions; and access to and quality of care. My question is more basic, though: What are the age-specific mortality rates by race/ethnicity?

Start tangent on why age-specific comparisons are important. In demography, breaking things down by age is a basic first-pass statistical control. Age isn’t inherently the most important variable, but (1) so many things are so strongly affected by age, (2) so many groups differ greatly in their age compositions, and (3) age is so straightforward to measure, that it’s often the most reasonable first cut when comparison groups. Very frequently we find that a simple comparison is reversed when age is controlled. Consider a classic example: mortality in a richer country (USA) versus a poorer country (Jordan). People in the USA live four years longer, on average, but Americans are more than twice as likely to die each year (9 per 1,000 versus 4 per 1000). The difference is age: 23% of Americans are over age 60, compared with 6% of Jordanians. More old people means more total deaths, but compare within age groups and Americans are less likely to die. A simple separation by age facilitates more meaningful comparison for most purposes. So that’s how I want to compare COVID-19 mortality across race/ethnic groups in the USA. End tangent.

Age-specific mortality rates

It seems like this should be easier, but I can’t find anyone who is publishing them on an ongoing basis. The Centers for Disease Control posts a weekly data file of COVID-19 deaths by age and race/ethnicity, but they do not include the population denominators that you need to calculate mortality rates. So, for example, it tells you that as of December 5 there have been 2,937 COVID-19 deaths among non-Hispanic Blacks in the age range 30-49, compared with 2,186 deaths among non-Hispanic Whites of the same age. So, a higher count of Black deaths. But it doesn’t tell you there are 4.3-times as many Whites as Blacks in that category. So a much higher mortality rate.

On a different page, they report the percentage of all deaths in each age range that have occurred in each race/ethnic group, don’t include their percentage in the population. So, for example, 36% of the people ages 30-39 who have died from COVID-19 were Hispanic, and 24% were non-Hispanic White, but that’s not enough information to calculate mortality rates either. I have no reason to think this is nefarious, but it’s clearly not adequate.

So I went to the 2019 American Community Survey (ACS) data distributed by IPUMS.org to get some denominators. These are a little messy for two main reasons. First, ACS is a survey that asks people what their race and ethnicity are, while death counts are based on death certificates, for which the person who has died is not available to ask. So some people will be identified with a different group when they die than they would if they were surveyed. Second, the ACS and other surveys allow people to specify multiple races (in addition to being Hispanic or not), whereas death certificate data generally does not. So if someone who identifies as Black-and-White on a survey dies, how will the death certificate read? (If you’re very interested, here’s a report on the accuracy of death certificates, and here are the “bridges” they use to try to mash up multiple-race and single-race categories.)

My solution to this is make denominators more or less the way race/ethnicity was defined before multiple race identification was allowed. I put all Hispanic people, regardless of race, into the Hispanic group. Then I put people who are White, non-Hispanic, and no other race into the White category. And then for the Black, Asian, and American Indian categories, I include people who were multiple race (and not Hispanic). So, for example, a Black-White non-Hispanic person is counted as Black. A Black-Asian non-Hispanic person is counted as both Black and Asian. Note I did also do the calculations for Native Hawaiian and Other Pacific Islanders, but those numbers are very small so I’m not showing them on the graph; they’re on the spreadsheet. Note also I say “American Indian” to include all those who are “non-Hispanic American Indian or Alaska Native.”

This is admittedly crude, but I suggest that you trust me that it’s probably OK. (Probably OK, that is, especially for Whites, Blacks, and Hispanics. American Indians and Asians have higher rates of multiple-race identification among the living, so I expect there would be more slippage there.)

Anyway, here’s the absolutely egregious result:

This figure allows race/ethnicity comparisons within the five age groups (under 30 isn’t shown). It reveals that the greatest age-specific disparities are actually at the younger ages. In the range 30-49, Blacks are 5.6-times more likely to die, and Hispanics are 6.6-times more likely to die, than non-Hispanic Whites are. In the oldest age group, over 85, where death rates for everyone are highest, the disparities are only 1.5- and 1.4-to-1 respectively.

Whatever the cause of these disparities, this is just the bottom line, which matters. Please note how very high these rates are at old ages. These are deaths per 100,000, which means that over age 85, 1.8% of all African Americans have died of COVID-19 this year (and 1.7% for Hispanics and 1.2% for Whites). That is — I keep trying to find words to convey the power of these numbers — one out of every 56 African Americans over age 85.

Please stay home if you can.

A spreadsheet file with the data, calculations, and figure, is here: https://osf.io/ewrms/.

Measuring inequality, and what the Gini index does (video)

I produced a short video on measuring inequality, focusing on the construction of the Gini index, the trend in US family inequality, and an example of using it to measure world inequality. It’s 15 minutes, intended for intro-level sociology students.

I like teaching this not because so many of my students end up calculating and analyzing Gini indexes, but because it’s a readily interpretable example of the value of condensing a lot of numbers down to one useful one — which opens up the possibility of the kind of analysis we want to do (Going up? Going down? What about France? etc.). It also helps introduce the idea that social students of inequality are systematic and scientific, and fun for people who like math, too.

The video is below, or you can watch it (along with my other videos) on YouTube. The slides are available here, including one I left out of the video, briefly discussing Corrado Gini and his bad (fascist, eugenicist) politics. Comments welcome.

Framing social class with sample selection

A lot of qualitative sociology makes comparisons across social class categories. Many researchers build class into their research designs by selecting subjects using broad criteria, most often education level, income level, or occupation. Depending on the set of questions at hand, the class selection categories will vary, focusing on, for example, upbringing and socialization, access to resources, or occupational outlook.

In the absence of a substantive review, here are a few arbitrarily selected examplar books from my areas of research:

This post was inspired by the question Caitlyn Collins asked the other day on Twitter:

She followed up by saying, “Social class is nebulous, but precision here matters to make meaningful claims. What do we mean when we say we’re talking to poor, working class, middle class, wealthy folks? I’m looking for specific demographic questions, categories, scales sociologists use as screeners.” The thread generated a lot of good ideas.

Income, education, occupation

Screening people for research can be costly and time consuming, so you want to maximize simplicity as well as clarity. So here’s a way of looking at some common screening variables, and what you might get or lose by relying on them in different combinations. This uses the 2018 American Community Survey, provided by IPUMS.org (Stata data file and code here).

  • I used income, education, and occupation to identify the status of individuals, and generated household class categories by the presence of absence of types of people in each. That means everyone in each household is in the same class category (a choice you might or might not want to make).
  • Income: Total household income divided by an equivalency scale (for cost of living). The scale counts each adult as 1 person, each child under 18 as .70, and then scales that count by ^.70. I divided the resulting distribution into thirds, so households are in the top, middle, or bottom third. Top third is what I called “middle/upper” class, bottom third is “lower class.”
  • Education: I use BA degree to identify households that have (middle/upper) or don’t (lower) a four-year college graduate present. This is 31% of adults.
  • Occupation: I used the 2018 ACS occupation codes, and coded people as middle/upper class if their codes was 10 to 3550, which are management, business, and financial occupations; computer, engineering, and science occupations; education, legal, community service, arts, and media occupations; and healthcare practitioners and technical occupations. It’s pretty close to what we used to call “managerial and professional” occupations. Together, these account for 37% of workers.

So each of these three variables identifies an upper/middle class status of about a third of people.

For lower class status, you can just reverse them. The except is income, which is in three categories. For that, I counted households as lower class if their household income was in the bottom third of the adjusted distribution. In the figures below, that means they’re neither middle/upper class nor lower class if they’re in the middle of the income distribution. This is easily adjusted.

Venn diagrams

You can make Venn diagrams in Stata using the pvenn2 add-on, which I naturally discovered after making these. If  you must know, made these by generating tables in Stata, downloading this free plotter app, entering the values manually, copying the resulting figures into Powerpoint and applying the text there, then printing them to PDF, and extracting the images from PDF using Photoshop. Not recommended workflow.

Here they are. I hope the visuals might help people think about for example, who they might get if they screened on just one of these variables, or how unusual someone is who has a high income or occupation but no BA, and so on. But draw your own conclusions (and feel free to modify the code and follow your own approach). Click to enlarge.

First middle/upper class:

Venn diagram of overlapping class definitions

Then lower class:

Venn diagram of overlapping class definitions.

I said draw your own conclusions, but please don’t draw the conclusion that I think this is the best way to define social class. That’s a whole different question. This is just about simply ways to select people to be research subjects. For other posts on social class, follow this tag, which includes this post about class self identification by income and race/ethnicity.


Data and code: osf.io/w2yvf/

Divorce fell in one Florida county (and 31 others), and you will totally believe what happened next

You can really do a lot with the common public misperception that divorce is always going up. Brad Wilcox has been taking advantage of that since at least 2009, when he selectively trumpeted a decline in divorce (a Christmas gift to marriage) as if it was not part of an ongoing trend.

I have reported that the divorce rate in the U.S. (divorces per married woman) fell 21 percent from 2008 to 2017.  And yet yesterday, Faithwire’s Will Maule wrote, “With divorce rates rocketing across the country, it can be easy to lose a bit of hope in the God-ordained bond of marriage.”

Anyway, now there is hope, because, as right-wing podcaster Lee Habeeb wrote in Newsweek, THE INCREDIBLE SUCCESS STORY BEHIND ONE COUNTY’S PLUMMETING DIVORCE RATE SHOULD INSPIRE US ALL. In fact, we may be on the bring of Reversing Social Disintegration, according to Seth Kaplan, writing in National Affairs. That’s because of the Culture of Freedom Initiative of the Philanthropy Roundtable (a right-wing funding aggregator run by people like Art Pope, Betsy Devos, the Bradley Foundation, the Hoover Institution, etc.), which has now been spun off as Cummunio, a marriage ministry that uses marriage programs to support Christian churches. Writes Kaplan:

The program, which has recently become an independent nonprofit organization called Communio, used the latest marketing techniques to “microtarget” outreach, engaged local churches to maximize its reach and influence, and deployed skills training to better prepare individuals and couples for the challenges they might face. COFI highlights how employing systems thinking and leveraging the latest in technology and data sciences can lead to significant progress in addressing our urgent marriage crisis.

The program claims 50,000 people attended four-hour “marriage and faith strengthening programs,” and further made 20 million Internet impressions “targeting those who fit a predictive model for divorce.” So, have they increased marriage and reduced divorce? I don’t know, and neither do they, but they say they do.

Funny aside, the results website today says “Communio at work: Divorce drops 24% in Jacksonville,” but a few days ago the same web page said 28%. That’s probably because Duval County (which is what they’re referring to) just saw a SHOCKING 6% INCREASE IN DIVORCE (my phrase) in 2018 — the 10th largest divorce rate increase in all 40 counties in Florida for which data are available (see below). But anyway, that’s getting ahead of the story.

Gimme the report

The 28% result came from this report by Brad Wilcox and Spencer James, although they don’t link to it. That’s what I’ll focus on here. The report describes the many hours of ministrations, and the 20 million Internet impressions, and then gets to the heart of the matter:

We answer this question by looking at divorce and marriage trends in Duval County and three comparable counties in Florida: Hillsborough, Orange, and Escambia. Our initial data analysis suggests that the COFI effort with Live the Life and a range of religious and civic partners has had an exceptional impact on marital stability in Duval County. Since 2016, the county has witnessed a remarkable decline in divorce: from 2015 to 2017, the divorce rate fell 28 percent. As family scholars, we have rarely seen changes of this size in family trends over such a short period of time. Although it is possible that some other factor besides COFI’s intervention also helped, we think this is unlikely. In our professional opinion, given the available evidence, the efforts undertaken by COFI in Jacksonville appear to have had a marked effect on the divorce rate in Duval County.

A couple things about these very strong causal claims. First, they say nothing about how the “comparable counties” were selected. Florida seems to have 68 counties, 40 of which the Census gave me population counts for. Why not use them all? (You’ll understand why I ask when they get to the N=4 regression.) Second, how about that “exceptional impact,” the “remarkable decline” “rarely seen” in their experience as family scholars? Note there is no evidence in the report of the program doing anything, just the three year trend. And while it is a big decline, it’s one I would call “occasionally seen.” (It helps to know that divorce is generally going down — something the report never mentions.)

To put the decline in perspective, first a quick national look. In 2009 there was a big drop in divorce, accelerating the ongoing decline, presumably related to the recession (analyzed here). It was so big that nine states had crude divorce rate declines of 20% or more in that one year alone. Here is what 2008-2009 looked like:

state divorce changes 08-09.xlsx

So, a drop in divorce on this scale is not that rare in recent times. This is important background Wilcox is (comfortably) counting on his audience not knowing. So what about Florida?

Wilcox and James start with this figure, which shows the number of divorces per 1000 population in Duval County (Jacksonville), and the three other counties:wj1

Again, there is no reason given for selecting these three counties. To test the comparison, which evidently shows a faster decline in Duval, they perform two regression models. (To their credit, James shared their data with me when I requested it — although it’s all publicly available this was helpful to make sure I was doing it the same way they did.) First, I believe they ran a regression with an N of 4, the dependent variable being the 2014-2017 decline in divorce rate, and the independent variable being a dummy for Duval. I share the complete dataset for this model here:

div_chg duval
1. -1.116101 1
2. -0.2544951 0
3. -0.3307687 0
4. -0.5048307 0

I don’t know exactly what they did with the second model, which must somehow how have a larger sample than 4 because it has 8 variables. Maybe 16 county-years? Anyway, doesn’t much matter. Here is their table:

wj2

How to evaluate a faster decline among a general trend toward lower divorce rates? If you really wanted to know if the program worked, you would have to study the program, people who were in the program and people who weren’t and so on. (See this writeup of previous marriage promotion disasters, studied correctly, for a good example.) But I’m quite confident that this conclusion is ridiculous and irresponsible: “In our professional opinion, given the available evidence, the efforts undertaken by COFI in Jacksonville appear to have had a marked effect on the divorce rate in Duval County.” No one should take such a claim seriously except as a reflection on the judgment or motivations of its author.

Because the “comparison counties” was bugging me, I got the divorce counts from Florida’s Vital Statistics office (available here), and combined them with Census data on county populations (table S0101 on census.data.gov). Since 2018 has now come out, I’m showing the change in each county’s crude divorce rate from 2015, before Communio, through 2018.

florida divorce counties.xlsx

You can see that Duval has had a bigger drop in divorce than most Florida counties — 32 of which saw divorce rates fall in this period. Of the counties that had bigger declines, Monroe and Santa Rosa are quite small, but Lake County is mid-sized (population 350,000), and bigger than Escambia, which is one of the comparison counties. How different their report could have been with different comparison cases! This is why it’s a good idea to publicly specify your research design before you collect your data, so people don’t suspect you of data shenanigans like goosing your comparison cases.

What about that 2018 rebound? Wilcox and James stopped in 2017. With the 2018 data we can look further. Eighteen counties had increased divorce rates in 2018, and Duval’s was large at 6%. Two of the comparison cases (Hillsborough and Escambria) had decreases in divorce, as did the state’s largest county, Miami-Dade (down 5%).

To summarize, Duval County had a larger than average decline in divorce rates in 2014-2017, compared with the rest of Florida, but then had a larger-than-average increase in 2018. That’s it.

Marriage

Obviously, Communio wants to see more marriage, too, but here not even Wilcox can turn the marriage frown upside down.

wj5

Why no boom in marriage, with all those Internet hits and church sessions? They reason:

This may be because the COFI effort did not do much to directly promote marriage per se (it focused on strengthening existing marriages and relationships), or it may be because the effort ended up encouraging Jacksonville residents considering marriage to proceed more carefully. One other possibility may also help explain the distinctive pattern for Duval County. Hurricane Irma struck Jacksonville in September of 2017; this weather event may have encouraged couples to postpone or relocate their weddings.

OK, got it — so they totally could have increased marriage if they had wanted to. Except for the hurricane. I can’t believe I did this, but I did wonder about the hurricane hypothesis. Here are the number of marriages per month in Duval County, from 13 months before Hurrican Irma (September 2017), to 13 months after, with Septembers highlighted.

jacksonville marriges.xlsx

There were fewer marriages in September 2017 than 2016, 51 fewer, but September is a slow month anyway. And they almost made up for it with a jump in December, which could be hurricane-related postponements. But then the following September was no better, so this hypothesis doesn’t look good. (Sheesh, how much did they get paid to do this report? I’m not holding back any of the analysis here.)

Aside: Kristen & Jessica had a beautiful wedding in Jacksonville just a few days after Hurricane Irma. Jessica recalled, “Hurricane Irma hit the week before our wedding, which damaged our venue pretty badly. As it was outdoors on the water, there were trees down all over the place and flooding… We were very lucky that everything was cleaned up so fast. The weather the day of the wedding turned out to be perfect!” I just had to share this picture, for the Communio scrapbook:

Portraits-0092-1024x682
Photo by Jazi Davis in JaxMagBride.

So, to recap: Christian philanthropists and intrepid social scientists have pretty much reversed social disintegration and the media is just desperate to keep you from finding out about it.

Also, Brad Wilcox lies, cheats, and steals. And the people who believe in him, and hire him to carry their social science water, don’t care.

Do rich people like bad data tweets about poor people? (Bins, slopes, and graphs edition)

Almost 2,000 people retweeted this from Brad Wilcox the other day.

bradpoorstv

Brad shared the graph from Charles Lehman (who noticed later that he had mislabeled the x-axis, but that’s not the point). First, as far as I can tell the values are wrong. I don’t know how they did it, but when I look at the 2016-2018 General Social Survey, I get 4.3 average hours of TV for people in the poorest families, and 1.9 hours for the richest. They report higher highs (looks like 5.3) and lower lows (looks like 1.5). More seriously, I have to object to drawing what purports to be a regression line as if those are evenly-spaced income categories, which makes it look much more linear than it is.

I fixed those errors — the correct values, and the correct spacing on the x-axis — then added some confidence intervals, and what I get is probably not worth thousands of self-congratulatory woots, although of course rich people do watch less TV. Here is my figure, with their line (drawn in by hand) for comparison:

tvfaminc-bradcharles

Charles and Brad’s post got a lot of love from conservatives, I believe, because it confirmed their assumptions about self-destructive behavior among poor people. That is, here is more evidence that poor people have bad habits and it’s just dragging them down. But there are reasons this particular graph worked so well. First, the steep slope, which partly results from getting the data wrong. And second, the tight fit of the regression line. That’s why Brad said, “Whoa.” So, good tweet — bad science. (Surprise.) Here are some critiques.

First, this is the wrong survey to use. Since 1975, GSS has been asking people, “On the average day, about how many hours do you personally watch television?” It’s great to have a continuous series on this, but it’s not a good way to measure time use because people are bad at estimating these things. Also, GSS is not a great survey for measuring income. And it’s a pretty small sample. So if those are the two variables you’re interested in, you should use the American Time Use Survey (available from IPUMS), in which respondents are drawn from the much larger Current Population Survey samples, and asked to fill out a time diary. On the other hand, GSS would be good for analyzing, for example, whether people who believe the Bible is the “the actual word of God and is to be taken literally, word for word” watch TV more than those who believe it is “an ancient book of fables, legends, history, and moral precepts recorded by men” (Yes, they do, about an hour more.) Or looking at all the other social variables GSS is good for.

On the substantive issue, Gray Kimbrough pointed out that the connection between family income and TV time may be spurious, and is certainly confounded with hours spent at work. When I made a simple regression model of TV time with family income, hours worked, age, sex, race/ethnicity, education, and marital status (which again, should be done better with ATUS), I did find that both hours worked and family income had big effects. Here they are from that model, as predicted values using average marginal effects.

tv work faminc

The banal observation that people who spend more time working spend less time watching TV probably wouldn’t carry the punch. Anyway, neither resolves the question of cause and effect.

Fits and slopes

On the issue of the presentation of slopes, there’s a good lesson here. Data presentation involves trading detail for clarity. And statistics have both have a descriptive and analytical purpose. Sometimes we use statistics to present information in simplified form, which allows better comprehension. We also use statistics to discover relationships we couldn’t otherwise — such as multivariate relationships that you can’t discern visually. The analyst and communicator has to choose wisely what to present. A good propagandist knows what to manipulate for political effect (a bad one just tweets out crap until they get lucky).

Here’s a much less click-worthy presentation of the relationship between family income and TV time. Here I truncate the y-axis at 12 hours (cutting off 1% of the sample), translate the binned income categories into dollar values at the middle of each category, and then jitter the scatterplot so you can see how many points are piled up in each spot. The fitted line is Stata’s median spline, with 9 bands specified (so it’s the median hours at the median income in 9 locations on the x-axis). I guess this means that, at the median, rich people in America watch about an hour of TV per day less than poor people, and the action is mostly under $50,000 per year. Woot.

gss tv income

Finally, a word about binning and the presentation of data (something I’ve written about before, here and here). We make continuous data into categories all the time, starting from measurement. We usually measure age in years, for example, although we could measure it in seconds or decades. Then we use statistics to simplify information further, for example by reporting averages. In the visual presentation of data, there is a particular problem with using averages or data bins to show relationships — you can show slopes that way nicely, but you run the risk of making relationships look more closely correlated than they are. This happens in the public presentation of data when analysts are showing something of their work product — such as a scatterplot with a fitted line — to demonstrate the veracity of their findings. When they bin the data first, this can be very misleading.

Here’s an example. I took about 1000 men from the GSS, and compared their age and income. Between the ages of 25 and 59, older men have higher average incomes, but the fit is curved with a peak around 45. Here is the relationship, again using jittering to show all the individuals, with a linear regression line. The correlation is .23

c1That might be nice to look at but it’s hard to see the underlying relationship. It’s hard to even see how the fitted line relates to the data. So you might reduce it by showing the average income at each age. By pulling the points together vertically into average bins, this shows the relationship much more clearly. However, it also makes the relationship look much stronger. The correlation in this figure is .65. Now the reader might think, “Whoa.”

c2Note this didn’t change the slope much (it still runs from about $30k to $60k), it just put all the dots closer to the line. Finally, here it is pulling the averages together in horizontal bins, grouping the ages in fives (25-29, 30-34 … 55-59). The correlation shown here is .97.

c3

If you’re like me, this is when you figured out that reducing this to two dots would produce a correlation of 1.0 (as long as the dots aren’t exactly level).

To make good data presentation tradeoffs requires experimentation and careful exposition. And, of course, transparency. My code for this post is available on the Open Science Framework here (you gotta get the GSS data first).