Here’s the 2021 update of a series I started in 2013. A few pandemic-specific facts below.
If anyone tells you that “facts are useless in an emergency,” give them a bad grade. Knowing basic demographic facts lets us run a quick temperature check on the pot we’re slowly boiling in — which we need to survive. The idea is to get your radar tuned to identify falsehoods as efficiently as possible, to prevent them spreading and contaminating reality. Although I grew up on “facts are lazy and facts are late,” I actually still believe in this mission, I just shake my head slowly while I ramble on about it (and tell the same stories over and over).
This year, in pursuit of this mission, I created the Demographic Fact A Day Twitter account, which started tweeting one fact per day at the start of 2021. Some of these are more advanced, some very simple. Here’s a figure from that account, for a taste:
Everyone likes a number that appears to support their perspective. But that’s no way to run (or change) a society. The trick is to know the facts before you create or evaluate an argument, and for that you need some foundational demographic knowledge. This list of facts you should know is just a prompt to get started in that direction.
The list below are demographic facts you need just to get through the day without being grossly misled or misinformed — or, in the case of journalists or teachers or social scientists, not to allow your audience to be grossly misled or misinformed. Not trivia that makes a point or statistics that are shocking, but the non-sensational information you need to make sense of those things when other people use them. And it’s really a ballpark requirement (when I test the undergraduates, I give them credit if they are within 20% of the US population — that’s anywhere between 266 million and 400 million!).
This is only a few dozen facts, not exhaustive but they belong on any top-100 list. This year, many of the most important facts are about the pandemic, but they’re not included here — these are some of what you need to understand the upheavals of the day. Feel free to add additional facts in the comments (as per policy, first-time commenters are moderated).
The numbers are rounded to reasonable units for easy memorization. All refer to the US unless otherwise noted. Most of the links will take you to the latest data:
Joe Pinsker at the Atlantic has written, “Amazon Ruined the Name Alexa,” that develops the story of the name, which I started tracking with a pick drop in 2017, writing: “You have to feel for people who named their daughters Alexa, and the Alexas themselves, before Amazon sullied their names. Did they not think of the consequences for these people? Another bad year for Alexa. After a 21.3% drop in 2016, another 19.5% last year.”
Amazon did not exactly ruin the life of every Alexa, but the consequences of its decision seven years ago are far-reaching—roughly 127,000 American baby girls were named Alexa in the past 50 years, and more than 75,000 of them are younger than 18. Amazon didn’t take their perfectly good name out of malice, but regardless, it’s not giving it back.
From the peak year of 2015, when there were 6,050 Alexas born in the US, the number fell 79% to 1272 in 2020, the biggest drop among names with at least 1000 girls born in 2015. Here’s that list:
Pinsker got Amazon on the record not commenting on the problem they created for actual humans named Alexa, who he reports are being bullied in school — they are not only named after a robot, but a subservient female one, so no surprise. Amazon said only, “Bullying of any kind is unacceptable, and we condemn it in the strongest possible terms.”
Cutting room floor
I am only quoted in the story saying, “We don’t usually think about the individuals who are already born when this happens, but the impact on their lives is real as well.” No complaint about that, of course. But since my interview with Pinsker was over email, I can share my other nuggets of insight here, with his questions:
I saw that you first blogged about this in 2018 (when you were remarking on the 2017 name data). Did you just happen to stumble upon Alexa’s declining popularity yourself, or did someone else point it out to you?
I wrote a program that identifies that names with biggest changes, and Alexa jumped out. One interesting thing about naming patterns is that dramatic changes are quite rare. Names rise and fall over time, but they rarely show giant leaps or collapse as dramatically as Alexa did after 2015.
When you look at what has happened to the name Alexa since Amazon’s Alexa was released in late 2014, how much of the name’s declining popularity do you attribute to Amazon? (Is it common for names to plummet in popularity as quickly as Alexa has since 2014?)
The Social Security national name data is a mile wide and an inch deep. We have a tremendous amount of name data, but it is all just counts of babies born — we have no direct information about who is using what names, or why. So any attribution of causal processes is speculative unless we do other research. That said, because dramatic changes are so rare, it’s usually pretty easy to explain them. For example, some classic 1970s hits apparently sparked name trends: Brandy (Looking Glass, 1972), Maggie (Rod Stewart, 1971), and of course Rhiannon (Fleetwood Mac, 1975). I defy you to find someone named Rhiannon, born in the US, who was born before 1975. We can also observe dramatic changes even among uncommon names, such as a doubling of girls named Malia in 2009 (the Obamas’ daughter’s name).
At one point, you mentioned on your blog that Hillary was another name that became less popular after becoming culturally ubiquitous. Are there any other examples you’re aware of, where a name’s cultural ubiquity tanks its popularity?
On the other hand, there are disaster stories, like Alexa. Hillary was rising in popularity before 1992, and then tanked. Monica declined dramatically after 1998 (after the Clinton sex scandal). Ellen became much less common suddenly the year after Ellen DeGeneres came out as gay in 1997. And Forrest, which had been on the rise before 1994, plummeted after Forrest Gump came out and virtually disappeared.
We don’t usually think about the individuals who are already born when this happens, but the impacts on their lives is real as well. The name trends tell us something about the social value of a name (and unlike other commodities, in the US at least there is no limit to the number of people who can have a name). People who were named Adolph before Hitler, Forrest before Forrest Gump, or Alexa before Amazon live with the experience of a devalued name. Many of them end up changing their names or using nicknames — or just getting used to people making jokes about their name every time they meet someone new, have attendance called, or go to the department of motor vehicles.
If I’m reading the SSA data correctly, there were 1,272 Alexas born last year in the U.S. I know this is speculative, but would you guess that most of these parents aren’t aware of the name of Amazon’s device? Or is it that they’re aware, and just don’t care?
Some don’t know, some don’t care, some probably think it’s cool. For some it may be a family name. I am fascinated to see that Alexis and Alexia have also seen five-year declines of more than 60% in name frequency. I wonder if that is because of concern over Alexa devices mishearing those names — certainly a reasonable concern — or maybe just association with the product making those names seem derivative or tacky. It’s hard to say.
Philip N. Cohen criticized the use of generation labels. Generations are one of many analytical lenses researchers use to understand societal change and differences across groups. While there are limitations to generational analysis, it can be a useful tool for understanding demographic trends and shifting public attitudes. For example, a generational look at public opinion on a wide range of social and political issues shows that cohort differences have widened over time on some issues, which could have important implications for the future of American politics.
In addition, looking at how a new generation of young adults experiences key milestones such as educational attainment, marriage or homeownership, compared with previous generations in their youth, can lend important insights into changes in American society.
To be sure, these labels can be misused and lead to stereotyping, and it’s important to stress and highlight diversity within generations. At Pew Research Center, we consistently endeavor to refine and improve our research methods. Therefore, we are having ongoing conversations around the best way to approach generational research. We look forward to engaging with Mr. Cohen and other scholars as we continue to explore this complex and important issue.
Kim Parker, Washington
I was happy to see this, and look forward to what they come up with. I am also glad to see that there has been no substantial defense of the current “generations” research regime. Some people on social media said they kind of like the categories, but no researcher has said they make sense, or pointed to any research justifying the current categories. With regard to her point that generations research is useful, that was in our open letter, and in my op-ed. Cohorts (and, if you want to call a bunch of a cohorts a generation, generations) matter a lot, and should be studied. They just shouldn’t be used with imposed fixed categories regardless of the data involved, and given names with stereotyped qualities that are presumed to extend across spheres of social life.
Several people have asked me for suggestions. My basic suggestion is to do like you learned in social science class, and use categories that make sense for a good reason. If you have no reason to use a set of categories, don’t use them. Instead, use an empty measure of time, like years or decades, as a first pass, and look at the data. As I argued here, there is not likely to be a set of birth years that cohere across time and social space into meaningful generational identities.
In the Op-Ed, I wrote this: “Generation labels, although widely adopted by the public, have no basis in social reality. In fact, in one of Pew’s own surveys, most people did not identify the correct generation for themselves — even when they were shown a list of options.” The link was to this 2015 report titled, “Most Millennials Resist the ‘Millennial’ Label” (which of course confirms a stereotype about this supposed generation). I was looking in particular at this graphic, which I have shown often:
It doesn’t exactly show what portion of people “correctly” identify their category, but I eyeballed it and decided that if only 18% of Silents and 40% of Millennials were right, there was no way Gen X and Boomers were bringing the average over 50%. Also, people could choose multiple labels, so those “correct” numbers was presumably inflated to some degree by double-clickers. Anyway, the figure doesn’t exactly answer the question.
The data for that figure come from Pew’s American Trends Panel Wave 10, from 2015. The cool thing is you can download the data here. So I figured I could do a little analysis of who “correctly” identifies their category. Unfortunately, the microdata file they share doesn’t include exact age, just age in four categories that don’t line up with the generations — so you can’t replicate their analysis.
However, they do provide a little more detail in the topline report, here, including reporting the percentage of people in each “generation” who identified with each category. Using those numbers, I figure that 57% selected the correct category, 26% selected an incorrect category, 9% selected “other” (unspecified in the report), and 8% are unaccounted for. So, keeping in mind that people can be in more than one of these groups, I can’t say how many were completely “correct,” but I can say that (according to the report, not the data, which I can’t analyze for this) 57% at least selected the category that matched their birth year, possibly in combination with other categories.
The survey also asked people “how well would you say they term [generation you chose] applies to you?” If you combine “very well” and “fairly well,” you learn, for example, that actual “Silents” are more likely to say “Greatest Generation” applies well to them (32%) than say “Silent” does (14%). Anyway, if I did this right, based on the total sample, 46% of people both “correctly” identified their generation title, and said the term describes them “well.” I honestly don’t know what to make of this, but thought I’d share it, since it could be read as me misstating the case in the Op-Ed.
Micah Altman and I have written a paper using the new Open Editors dataset from Andreas Pacher, Tamara Heck, and Kerstin Schoch. They scraped up data on almost half a million editors (editors in chief, editors, editorial board member) at more than 6000 journals from 17 publishers (most of the big ones; they’ve since added some more). Micah and I genderized them (fuzzily), geolocated them in countries, and then coded the journals as either open access or not (using the Directory of Open Access Journals), and according to whether they practice transparency in research (using the Transparency and Openness Promotion signatories). After just basic curiosity about diversity, we wondered whether those that practice open access and research transparency have better gender and international diversity.
The results show overwhelming US and European dominance, not surprisingly. And male dominance, which is more extreme among editors in chief, across all disciplines. Open access journals are a little less gender diverse, and transparency-practicing journals a little more internationally diverse, but those relationships aren’t strong. There are other differences by discipline. A network analysis shows not much overlap between journals, outside of a few giant clusters (which might indicate questionable practices) although it’s hard to say for sure — journals should really use ORCIDs for their editors. Kudos to Micah for doing the heavy lifting on the coding, which involved multiple levels of cleaning and recoding (and for making the R markdown file for the whole thing available).
Lots of details in the draft, here. Feedback welcome!
Data from the Social Security Administration show that the names Kobe and Gianna had the greatest increase in popularity of any names in the country in 2020; as Kobe boys increased from 499 to 1500 and Gianna girls from 3408 to 7826. Kobe Bryant and his daughter Gianna died in a helicopter crash on January 26 last year, one of the dramatic national news events eclipsed by the pandemic (George Floyd’s daughter, now 7 years old, is also named Gianna).
The Kobe count of 1500 was surpassed only in 2001, during his first run of NBA championships, but the number per 1000 births was higher in 2020. Here is the trend:
And the Gianna trend, with a similar increase off a much higher base. Gianna became the 12th most common name given to girls in 2020.
Other news from the pandemic year in naming
Besides Gianna, not much change in the top 20 names, by gender, as Olivia, Emma, Liam, and Noah continued their dominance. Most of the top 20 names declined in popularity last year.
Outside the top names, the biggest drop in percentage terms (among those with at least 1000 births) was Alexa, who fell another 36%, from 1995 to 1272. Alexa has had a historically catastrophic decline since Amazon gave the name to its robot shopping companion (discussed last year).
Finally, Mary remains dormant, with 2188 girls getting the name in 2020, a drop of 21 from 2209. I told the story of Mary going back to the Revolutionary War on this blog and in Enduring Bonds. Still ripe for a comeback (jinx). Here’s an updated Figure 1:
The Social Security Data and Stata code for this analysis is here under CC0 license: osf.io/m48qc/. Note SSA updates their denominators every year; I have a file of those in here too.
An update from Pew, today’s thoughts, and then another data exercise.
After sending it the folks in charge at the Pew Research Center, I received a very friendly email response to our open letter on generation labels. They thanked me and reported that they already had plans to begin an internal discussion about “generational research” and will be consulting with experts as they do, although the timeline was not given. I take this to mean we have a bona fide opportunity to change course on this issue, both with Pew (which has outsized influence) and more widely in the coming months. But the outcome is not assured. If you agree that the “generations” labels and surrounding discourse are causing more harm than good, for researchers and the public, I hope you will join with me and 140+ social scientists who have signed the letter so far, by signing and sharing the letter (especially to people who aren’t on Twitter). Thanks!
Why “generations” won’t work
Never say never, but I don’t see how it will be possible to identify coherent, identifiable, stable, collectively recognized and popularly understood “generation” categories, based on year of birth, that reliably map onto a diverse set of measurable social indicators. If I’m right about that, which is an empirical question, then whether Pew’s “generations” are correctly defined will never be resolved, because the goal is unattainable. Some other set of birth-year cutoffs might work better for one question or another, but we’re not going to find a set of fixed divisions that works across arenas — such as social attitudes, family behavior, and economic status. So we should instead work on weaning the clicking public from its dependence on the concept and get down to the business of researching social trends (including cohort patterns), and communicating about that research in ways that are intelligible and useful.
Here are some reasons why we don’t find a good set of “generation” boundaries.
1. Mass media and social media mean there are no unique collective experiences
When something “happens” to a particular cohort, lots of other people are affected, too. Adjacent people react, discuss, buy stuff, and define themselves in ways that are affected by these historical events. Gradations emerge. The lines between who is and is not affected can’t be sharply drawn by age.
2. Experiences may be unique, but they don’t map neatly onto attitudes or adjacent behaviors
Even if you can identify something that happened to a specific age group at a specific point in time, the effects of such an experience will be diffuse. To name a few prominent examples: some people grew up in the era of mass incarceration and faced higher risks of being imprisoned, some people entered the job market in 2009 and suffered long-term consequences for their career trajectories, and some people came of age with the Pill. But these experiences don’t mark those people for distinct attitudes or behaviors. Having been incarcerated, unemployed, or in control of your pregnancy may influence attitudes and behaviors, but it won’t set people categorically apart. People whose friends or parents were incarcerated are affected, too; grandparents with unemployed people sleeping on their couches are affected by recessions; people who work in daycare centers are affected by birth trends. And, of course, African Americans have a unique experience with mass incarceration, rich people can ride out recessions, and the Pill is for women. When it comes to indicators of the kind we can measure, effects of these experiences will usually be marginal, not discrete, and not universal. (Plus, as cool new research shows, most people don’t change their minds much after they reach adulthood, so any effects of life experience on attitudes are swimming upstream to be observable at scale.)
3. It’s global now, too
Local experiences don’t translate directly to local attitudes and behavior because we share culture instantly around the world. So, 9/11 happened in the US but everyone knew about it (and there was also March 11 in Spain, and 7/7 in London). There are unique things about them that some people experienced — like having schools closed if you were a kid living in New York — but also general things that affected large swaths of the world, like heightened airline security. The idea of a uniquely affected age group is implausible.
Once word gets out (through research or other means) about a particular trait or practice associated with a “generation,” like avocado toast or student debt, it gets processed and reprocessed reflexively by people who don’t, or do, want to embody a stereotype or trend for their supposed group. This includes identifying with the group itself — some people avoid it and some people embrace it, and some people react to who does the other things in other ways — until the category falls irretrievably into a vortex of cultural pastiche. The discussion of the categories, in other words, probably undermines the categories as much as it reinforces them.
If all this is true, then insisting on using stable, labeled, “generations” just boxes people into useless fixed categories. As the open letter puts it:
Predetermined cohort categories also impede scientific discovery by artificially imposing categories used in research rather than encouraging researchers to make well justified decisions for data analysis and description. We don’t want to discourage cohort and life course thinking, we want to improve it.
Mapping social change
OK, here’s today’s data exercise. There is some technical statistical content here not described in the most friendly way, I’m sorry to say. The Stata code for what follows is here, and the GSS 1972-2018 Cross-Sectional Cumulative Data file is free, here (Stata version); help yourself.
This is just me pushing at my assumptions and supplementing my reading with some tactile data machinations to help it sink in. Following on the previous exercise, here I’ll try out an empirical method for identifying meaningful birth year groupings using attitude questions from the General Social Survey, and then see if they tell us anything, relative to “empty” categories (single years or decades) and the Pew “generations” scheme (Silent, Baby Boom, Generation X, Millennials, Generation Z).
I start with five things that are different about the cohorts of nowadays versus those of the olden days in the United States. These are things that often figure in conversations about generational change. For each of these items I use one or more questions to create a single variable with a mean of 0 and a standard deviation of 1; in each case a higher score is the more liberal or newfangled view. As we’ll see, all of these moved from lower to higher scores as you look at more recent cohorts.
Liberal spending: Believing “we’re spending too little money on…” seven things: welfare, the environment, health, big cities, drug addiction, education, and improving the conditions of black people. (For this scale, the measure of reliability [alpha] is .66, which is pretty good.)
Gender attitudes: Four questions on whether women are “suited for politics,” working mothers are bad for children, and breadwinner-homemaker roles are good. High scores mean more feminist (alpha = .70).
Confidence in institutions: Seven questions on organized religion, the Supreme Court, the military, major companies, Congress, the scientific community, and medicine. High scores mean less confidence (alpha = .68).
General political views from extremely conservative to extremely liberal (one question)
Never-none: People who never attend religious services and have no religious affiliation (together now up to about 16% of people).
These variables span the survey years 1977 to 2018, with respondents born from 1910 to 1999 (I dropped a few born in 2000, who were just 18 years old in 2018, and those born before 1910). Because not all questions were asked of all the respondents in every year I lost a lot of people, and I had to make some hard choices about what to include. The sample that answered all these questions is about 5,500 people (down from almost 62,000 altogether — ouch!). Still, what I do next seems to work anyway.
Once I have these five items, I combine them into a megascale (alpha = .45) which I use to represent social change. You can see in the figure that successive cohorts of respondents are moving up this scale, on average. Note that these cohorts are interviewed at different points in time; for example, a 40-year-old in 1992 is in the same cohort as a 50-year-old in 2002, while the 1977 interviews cover people born all the way back to 1910. That’s how I get so many cohorts out of interviews from just 1977 to 2018 (and why the confidence intervals get bigger for recent cohorts).
The question from this figure is whether the cohort attitude trend would be well served by some strategic cutpoints to denote cohorts (“generations” not in the reproductive sense but in the sense of people born around the same time). Treating each birth year as separate is unwieldy, and the samples are small. We could just use decades of birth, or Pew’s arbitrary “generations.” Or make up new ones, which is what I’m testing out.
So I hit on a simple way to identify cutpoints using an exploratory technique known as k means clustering. This is a simple (with computers) way to identify the most logical groups of people in a dataset. In this case I used two variables: the megascale and birth year. Stata’s k means clustering algorithm then tries to find a set of groups of cases such that the differences within them (how far each case is from the means of the two variables within the group) are as small as possible. (You tell it k, the number of groups you want.) Because cohort is a continuous variable, and megascale rises over time, the algorithm happily puts people in clusters that don’t have overlapping birth years, so I get nicely ordered cohorts. I guess for a U-shaped time pattern it would put young and old people in the same groups, which would mess this up, but that’s not the case with this pattern.
I tested 5, 6, and 7 groups, thinking more or fewer than that would not be worth it. It turns out 6 groups had the best explanatory power, so I used those. Then I did five linear regressions with the megascale as the dependent variable, a handful of control variables (age, sex, race, region, and education), and different cohort indicators. My basic check of fit is the adjusted R2, or the amount of variance explained adjusted for the number of variables. Here’s how the models did, in order from worst to best:
One linear cohort variable
My cluster categories
Decades of birth
Each year individually
Each year is good for explaining variance, but too cumbersome, and the Pew “generations” were the worst (not surprising, since they weren’t concocted to answer this question — or any other question). My cluster categories were better than just entering birth cohort as a single continuous variable, and almost as good as plain decades of birth. My scheme is only six categories, which is more convenient than nine decades, so I prefer it in this case. Note I am not naming them, just reporting the birth-year clusters: 1910-1924, 1925-1937, 1938-1949, 1950-1960, 1961-1974, and 1975-1999. These are temporary and exploratory — if you used different variables you’d get different cohorts.
Here’s what they look like with my social change indicators:
Shown this way, you can see the different pace and timing of change for the different indicators — for example, gender attitudes changed most dramatically for cohorts born before 1950, the falling confidence in institutions was over by the end of the 1950s cohort, and the most recent cohort shows the greatest spike in religious never-nones. Social change is fascinating, complex, and uneven!
You can also see that the cuts I’m using here look nothing like Pew’s, which, for example, pool the Baby Boomers from birth years 1946-1964, and Millennials from 1980 to 1996. And they don’t fit some stereotypes you hear. For example, the group with the least confidence in major institutions is those born in the 1950s (a slice of Baby Boomers), not Millennials. Try to square these results with the ridiculousness that Chuck Todd recently offered up:
So the promise of American progress is something Millennials have heard a lot about, but they haven’t always experienced it personally. … And in turn they have lost confidence in institutions. There have been plenty of scandals that have cost trust in religious institutions, the military law enforcement, political parties, the banking system, all of it, trust eroded.
You could delve into the causes of trust erosion (I wrote a paper on confidence in science alone), but attributing a global decline in trust to a group called “Millennials,” one whose boundaries were declared arbitrarily, without empirical foundation, for a completely unrelated purpose, is uninformative at best. Worse, it promotes uncritical, determinist thinking, and — if it gets popular enough — encourages researchers to use the same meaningless categories to try to get in line with the pop culture pronouncements. You get lots of people using unscrutinized categories, compounding their errors. Social scientists have to do better, by showing how cohorts and life course events really are an important way to view and comprehend social change, rather than a shallow exercise in stereotyping.
The categories I came up with here, for which there is some (albeit slim) empirical justification, may or may not be useful. But it’s also clear from looking at the figures here, and the regression results, that there is no singularly apparent way to break down birth cohorts to understand these trends. In fact, a simple linear variable for year of birth does pretty well. These are sweeping social changes moving through a vast, interconnected population over a long time. Each birth cohort is riven with major disparities, along the stratifying lines of race/ethnicity, gender, and social class, as well as many others. There may be times when breaking people down into birth cohorts helps understand and explain these patterns, but I’m pretty sure we’re never going to find a single scheme that works best for different situations and trends. The best practice is probably to look at the trend in as much detail as possible, to check for obvious discontinuities, and then, if no breaks are apparent, use an “empty” category set, such as decades of birth, at least to start.
It will take a collective act of will be researchers. teachers, journalists, and others, to break our social change trend industry of its “generations” habit. If you’re a social scientist, I hope you’ll help by signing the letter. (I’m also happy to support other efforts besides this experts letter.)
Note on causes
Although I am talking about cohorts, and using regression models where cohort indicators are independent variables, I’m not assessing cohort effects in the sense of causality, but rather common experiences that might appear as patterns in the data. We often experience events through a cohort lens even if they are caused by our aging, or historical factors that affect everyone. How to distinguish such age, period, or cohort effects in social change is an ongoing subject of tricky research (see this from Morgan and Lee for a recent take using the GSS) , but it’s not required to address the Pew “generations” question: are there meaningful cohorts that experience events in a discernibly collective way, making them useful groups for social analysis.
One of the problems with the fake “generations” discourse is it confuses people on the importance of age (how old people are) versus cohort (when they were born). At any one point in time, these two concepts are measured the same way — asking people how old they are. But if you follow people over time, they answer two different questions. The first tells you how younger people are different from older people (for example, old people are more likely to die of heart attacks, because their hearts wear out as they age), while the second tells you how people’s lives may have changed as they live them. So, young people today live in a world where their grandparents are less likely to die of heart attacks because of improvements in medicine, which changes the nature of childhood and then adult life as well. That’s a cohort story.
One of the problems with the way the Pew Research Center reports on their polls is they usually survey people at one point in time (age), and then describe the results by “generation” (cohorts). Here’s a figure they just published:
This seems to tell a clear “generations” story: “OK, Boomer,” that story goes, “it’s time to get serious about climate change.” But why didn’t Pew just title the figure, “younger people more active than older people addressing climate change”? You’d have to ask them their reasons, but their “generations” framing sure is popular. (Pause for outrage over the fact that, not only do the figures not tell readers the ages of these “generations,” treating them as if they are obvious to everyone, but even in the text of the report, where they do mention in passing the birth years for Millennials and Gen Z, they never give the birth years for Gen X and Boomers.)
But what if it’s not a “generations” (cohort) story, but an age story? How would you know? You would ask the same question to people over time, and see how their attitudes may have changed over the course of their lives. That’s difficult and expensive, and besides the questions you might want to ask change over time. But you can come close by asking a random sample of people the same questions over time — something the clever sociologists behind the General Social Survey figured out 50 years ago, and they’ve been doing it ever since. In fact, the General Social Survey (GSS) since 1973 has been asking people, “are we spending too much, too little, or about the right amount on the environment?” (in another version of the question they added, “improving and protecting” before “the environment”; I use both questions). It’s not exactly what Pew asked, but it gives us an idea. And that idea is: this is much more of age story than a “generations” (cohort) story.
Since the 1970s, between 70% and 75% of Americans under 35 have said we’re spending “too little” on “improving and protecting the environment.” (Note this age breakdown is arbitrary, using a common definition of “young adult.”) It’s not all an age story, though. Over time, this age gap has closed some, which means some of those pro-environment spending young people have grown into pro-environment spending older people — which is a cohort story. But you can’t reasonably tell a dramatic “generations” story about “Generation Z” unless you compare them to people at the same age in the past.
That cohort story
Cohorts, and the life course perspective with which they are associated, are important, however. So telling the cohort story could be valuable. If you were going to tell that story, how would you do it?
First, you could look at survey respondents by individual year of birth, which is the most detailed breakdown you can get from the GSS. One problem with that is sample size. Breaking the sample up into 90 cohorts or so leaves you with as few as 21 respondents and no more than 1,200 for any one cohort who answered this question. The other problem is it’s harder to interpret because you have a lot of data points.
Second, you could go with some generation scheme you import from another source, from conventional wisdom, or something that might get you more clicks. That might mean using cohorts based on political trends in general, or economic or demographic trends. That is, you could pick a category scheme from somewhere else and hope it’s relevant to this analysis, too. This is the Pew approach. The crucial thing to do in this case, which is what Pew does not do, is acknowledge that you have done something either arbitrarily or based on theory, and that it might or might not be a reasonable fit to your data. Don’t just assume they’re the right categories and pretend no others exist.
Third, in the absence of a clear theory or other data to suggest a way to group individual years, you go with something “empty” and simple, such as binning them by decade of birth.
Fourth, you could try to use empirical methods to identify the best groupings from the data. If you’re better at statistics than me, like Bruce Western and Meredith Kleykamp, you could do something like their “Bayesian Change Point Model for Historical Time Series Analysis” (free copy). Or you could use other methods, or eyeball it. I have experimented with this using a few different GSS trends. Based on variance explained, my very messy results have failed to do better than single years or decades, and done either better or worse than Pew’s generations depending on how I do it. So for now I’ll just show you the results from the first three options here.
Here is Stata code for what follows (and to cut the data by cohort and look for patterns in the variance explained by cohort, which I’m not showing in the post).
Using the three methods — single years of birth, Pew generations, and decades of birth — I made six linear probability models (OLS regressions with a dichotomous dependent variable), with one for each scheme with and without a set of demographic control variables: age (and age-squared), sex, race/ethnicity, region, and education. The main way I compare these models in the table below is with the adjusted R2, which is a measure of variance explained by the model, taking into account the number of variables used. Looking at the adjusted R2 for models 2, 4, and 6 shows that individual years explain the most (.0447), followed by decades (.0440) and then Pew’s generations (.0437). These aren’t big differences, but then the models aren’t explaining that much of this trend altogether. (When I added presumably-endogenous measures for political ideology, political party identification, and church attendance, which are all correlated with views on the environment, I got this adjusted R2 up to .0874.) Note I only used people born from 1910 to 1999 (age 19 in 2018; GSS only interviews adults age 18+).
The linear probability model results are easy to interpret because the coefficients tell you how far each variable moves the average case from 0 (doesn’t think we spend too little) toward 1 (thinks we spend too little). For example, in model 6, the average person born in the 1950s is 11.2% more likely to think this than someone born in the 1910s (controlling for age and the other factors).
So the individual-year model performs best, and Pew’s generations perform worst. Here’s what they predict for each cohort, again using models 2, 4, and 6, at the mean of the controls:
The single-year panel (A) shows the messiness of single years, as at the end of the series we have fewer and fewer cases — all the people born in 1999 in the GSS were interviewed in one survey, 2018, and they were all ~19, so we don’t have much to go on; it’s better to pool some of those years for more stable estimates. (You can also see that wider confidence interval for the Pew generations panel (B), where there are only 108 respondents in Generation Z in the data [born before 2000].) Still, you can see that the single-year panel suggests a mild wave, peaking in the 1950s, sliding for people born in the 1970s and 1980s, and rising again. Pew, which cuts the data at 1965, 1981, and 1997, mostly misses this — but it shows up in the decades panel (C). People born in the 1950s and 1960s were most likely to think we spend too little on the environment (65%), those born in the 1970s and 1980s a little less so (63%), and then those born in the 1990s spiked back upward to 68% (this last estimate based on 996 people born in the 1990s).
So, there are some interesting cohort trends here, captured worse by the Pew categories than by decades or single years of birth. Still, none of these cohort differences is as big as the simple age difference I started with. (If you add control variables to the figures at the top, the age gap remains 12% to 19%.) Young people have been, and remain, more likely to say we spend too little money on the environment. That’s not a bad story.
Given its popularity, it’s important to note the Pew model performs worst among these three. Of course, if Pew’s generations were real things, clearly defined with respect to other social trends, associated with social identities, and with empirical research to support their relevance, it might still be justified to analyze this question according to those categories. (For example, even though we don’t measure “race” perfectly, the variable as measured is associated with lots of important disparities and trends, and people feel affinities with their racial identities, so it’s not wrong to look at something like attitudes toward the environment by “race,” as commonly defined.) But the Pew categories are not justified in that way. Any cohort binning will show you something, especially for trends that are strongly patterned across time and age. But it doesn’t follow that any common binning is the best — that’s a question for research, research that hasn’t been done.
Full disclosure: I am an unpaid member of the General Social Survey advisory board. (And, like most people who say “full disclosure,” I’m mentioning this mostly to impress you, and secondarily to disclose any potential conflict of interest.)
We are demographers and other social scientists, writing to urge the Pew Research Center to stop using its generation labels (currently: Silent, Baby Boom, X, Millennial, Z). We appreciate Pew’s surveys and other research, and urge them to bring this work into better alignment with scientific principles of social research.
Pew’s “generations” cause confusion.
The groups Pew calls Silent, Baby Boom, X, Millennial, and Z are birth cohorts determined by year of birth, which are not related to reproductive generations. There is further confusion because their arbitrary lengths (18, 19, 16, 16, and 16 years, respectively) have grown shorter as the age difference between parents and their children has lengthened.
The division between “generations” is arbitrary and has no scientific basis.
With the exception of the Baby Boom, which was a discrete demographic event, the other “generations” have been declared and named on an ad hoc basis without empirical or theoretical justification. Pew’s own research conclusively shows that the majority of Americans cannot identify the “generations” to which Pew claims they belong. Cohorts should be delineated by “empty” periods (such as individual years, equal numbers of years, or decades) unless research on a particular topic suggests more meaningful breakdowns.
Naming “generations” and fixing their birth dates promotes pseudoscience, undermines public understanding, and impedes social science research.
The “generation” names encourage assigning them a distinct character, and then imposing qualities on diverse populations without basis, resulting in the current widespread problem of crude stereotyping. This fuels a stream of circular debates about whether the various “generations” fit their associated stereotypes, which does not advance public understanding.
The popular “generations” and their labels undermine important cohort and life course research
Cohort analysis and the life course perspective are important tools for studying and communicating social science. But the vast majority of popular survey research and reporting on the “generations” uses cross-sectional data, and is not cohort research at all. Predetermined cohort categories also impede scientific discovery by artificially imposing categories used in research rather than encouraging researchers to make well justified decisions for data analysis and description. We don’t want to discourage cohort and life course thinking, we want to improve it.
The “generations” are widely misunderstood to be “official” categories and identities
Pew’s reputation as a trustworthy social research institution has helped fuel the false belief that the “generations” definitions and labels are social facts and official statistics. Many other individuals and organizations use Pew’s definitions in order to fit within the paradigm, compounding the problem and digging us deeper into this hole with each passing day.
The “generations” scheme has become a parody and should end.
With the identification of “Generation Z,” Pew has apparently reached the end of the alphabet. Will this continue forever, with arbitrarily defined, stereotypically labeled, “generation” names sequentially added to the list? Demographic and social analysis is too important to be subjected to such a fate. No one likes to be wrong, and admitting it is difficult. We sympathize. But the sooner Pew stops digging this hole, the easier it will be to escape. A public course correction from Pew would send an important signal and help steer research and popular discourse around demographic and social issues toward greater understanding. It would also greatly enhance Pew’s reputation in the research community. We urge Pew to end this as gracefully as possible — now.
As consumers of Pew Research Center research, and experts who work in related fields ourselves, we urge the Pew Research Center to do the right thing and help put an end to the use of arbitrary and misleading “generation” labels and names.
One thing Duante Wright, Philando Castile, Walter Scott, Samuel DeBose, and Rayshard Brooks, have in common is that the police who killed them could have accomplished whatever they were legitimately supposed to be doing without a gun on their hip. The police in these incidents had no reason to anticipate violence in the interactions. There was no report of a violent crime, no weapons visible, no sign of anyone in imminent danger. Whether you think the police acted with racist malice, incompetence, or even reasonably, the fact is that if the police who killed them weren’t carrying guns no one would have died.
The structural approaches to police violence introduced in the last year, including reducing police funding to replace them with other agencies and services, involve big, complex proposals. For example, a recent law review article by Jordan Blair Woods reasonably suggests replacing police with unarmed civilian enforcers of traffic codes. These would require changing laws and restructuring government budgets.
A much simpler and immediately effective remedy to at least some of our problem is a simple matter of police department policy: don’t wear your guns.
Whether it was poor training, racism, malice, or just fatally bad luck that led Kimberly Potter to shoot Duante Wright with her gun instead of her Taser in Booklyn Center, Minnesota earlier this month, the body camera recording clearly shows she had nothing in her hands just seconds earlier. She didn’t enter the scene with her gun out because there was no reason to suspect violence, and in fact the only violence that occurred was her shooting Wright. If she hadn’t had a gun on her hip, he wouldn’t have died.
For all the talk of “de-escalation” in police interactions with the public, this simple solution is routinely overlooked. In any potentially violent conflict, the stakes are automatically raised to the level of the deadliest weapon present. Guns escalate conflict.
The policy details are important. In a society awash in guns (unlike many of those where police are usually unarmed), police here will sometimes need them for good reasons. You could start with some units dedicated to traffic enforcement, for example. Some police could have guns in a safe in the trunk of their car. Special units could be routinely armed. But the officers who come to your (my) house to discuss online death threats don’t need to be wearing firearms.
There are risks to police from such an approach, but the present default unreasonably assumes that carrying guns only reduces those risks. How often are unarmed police killed at traffic stops? If we don’t know the answer to that, maybe it hasn’t been sufficiently tried. If your response is, “one traffic cop killed is too many,” try applying that logic to the unarmed victims of police.
Even if you believe Darren Wilson, who said Michael Brown tried to take his gun in Ferguson, Missouri in 2014, possession of the gun was the basis of their violent conflict. Even if Darren Wilson had been just as racist in harassing Brown for walking in the street, no one would have died if Wilson hadn’t had a gun.
A Justice Department report on Michael Brown’s death noted, “Under well-established Fourth Amendment precedent, it is not objectively unreasonable for a law enforcement officer to use deadly force in response to being physically assaulted by a subject who attempts to take his firearm.” Well-established, perhaps, but that’s tragically circular – cop has a right to kill someone with his gun who tries to take his gun – because he has a gun.
If Duante Wright or Michael Brown or George Floyd had resisted arrest, punched an officer, or driven off to escape law enforcement, no one would have died. But that’s not all that would be different. If police in those situations, and millions of others, weren’t carrying guns, we could develop a new mutual understanding between the police and public: Police won’t “accidentally” kill you during a traffic stop or when reacting to nonviolent infractions, but if you do attack unarmed police, more police will show up later and they will have reason to be armed.
What might seem riskier to police upfront – leaving the gun in the trunk, or at the station – would certainly lead to fewer deaths of innocent, unarmed, nonviolent, people. Given the scale of innocent life taken in such incidents, and its effects on relations between the public and the police, that is a paramount concern for equity, civil rights, and law enforcement. But by reducing the stakes of individual interactions with police – automatically de-escalating them – it would probably also end up making the job safer for police as well.
Policing is dangerous work, work the police make more dangerous by introducing firearms into many interactions that should remain nonviolent. Would removing the holster from the standard uniform discourage people from becoming police? To some extent it might. But if not wearing a gun discouraged the kind of person for whom wearing a gun is the best part of the job, so much the better.
In the war between armed police and the unarmed public, the police should unilaterally disarm.
Joe Pinsker at the Atlantic has a piece out on the coming (probable) baby bust. In it he reviews existing evidence for a coming decline in births as a result of the pandemic, especially including historical comparisons and Google search data. Could we see this already?
The baby bust isn’t expected to begin in earnest until December. And it could take a bit longer than that, Sarah Hayford, a sociologist at Ohio State University, told me, if parents-to-be didn’t adjust their plans in response to the pandemic immediately back in March, when its duration wasn’t widely apparent.
If people immediately changed their plans in February, we might see a decline in births in October, but Hayford is right that’s early. And what about September, for which I’ve already observed declining births in Florida and California? If people who were pregnant already in January had miscarriages or abortions because of the pandemic, that would result in fewer births in September, but how big could that effect be? So maybe the Florida and California data are flukes, or data errors, or lots of pregnant people left those states and gave birth elsewhere (or pregnant people who normally come didn’t arrive). Perhaps more likely is that 2020 was already going to be a down year. As I told Pinsker:
“It might actually be that we were already heading for a record drop in births this year … If that’s the case, then birth rates in 2021 are probably going to be even more shockingly low.”
Anyway, we’ll find out soon enough. And to that end I’ve started assembling a dataset of monthly births where I can find them, which so far includes Florida, California, Oregon, Arizona, North Carolina, Ohio, Hawaii, Sweden, Finland, Scotland, and the Netherlands, to varying degrees of timeliness. As of today we have October data for some of them:
As of now Florida and California remain the strongest cases for a pandemic effect. But they are also both likely to add some more births to October (in November’s report, California increased the September number by 3%).
Anyway, lots of speculation while we’re killing time. You can get the little dataset here on the Open Science Framework: https://osf.io/pvz3g/. Check the date on the .csv or .xlsx file to see what I last updated it. I’ll add more countries or states if I find out about them.