Adjectives for children’s chronic conditions

In the Google ngrams database of American English, I got relative frequencies of the terms x+children, where x is a chronic malady of some sort. I tried a lot of different ones, and only included ones that topped the list at least once in the past 100 years. The most common (as suggested in the comments below) is “handicapped children,” which dominates all others from 1920 to 1995. After that, this is what I came up with, ordered by the period in which they were #1:

  • 1910s: sickly children
  • 1920s: neurotic children
  • 1930s-1950s: maladjusted children
  • 1965-1975: psychotic children
  • Mid-1970s, briefly: hyperactive children
  • Late 1970s-2000s: disabled children

After the mid-1990s, however, “children with disabilities” becomes more common than any of them. I couldn’t find anything in the old days that was as popular as disabled or hyperactive would later become. Does this imply more concern or negative attention to children?

Here is the figure. The frequency of each term is shown in relation to the total uses of “children” (click to enlarge):


If you think I missed anything, to play with it yourself, or to see how I did it, here’s the link.

Another question about the same terms: are they individualized (x-child) or grouped (x-children)? Summing all the terms with child, shown as a percentage of all the terms with children (leaving out “with disabilities”), produces this figure (smoothed to a 10-year curve):


Individualization peaked from 1920 to 1940, when the combined individual terms outnumbered the plural terms, before sliding till 1990. Now we may be in an individualizing rebound. (Here is the link to that search if you’re interested in the coding).

I get a kick out of language history like this. But I draw no conclusions without further study.



6 family correlations that will blow your mind or break your heart, and that probably aren’t spurious

The title is supposed to be funny.

Some data trends and patterns are correlated just by chance, such as the trends in the high fructose corn syrup consumption and the Florida divorce rate. But there are other correlations that, although seeming highly improbable, and you might never have predicted them, are not actually spurious. For finding those, there is Google Correlate. Out of the billions of possible correlations with the first term in these pairs – either across states or over time – each of these was in the top 100. The possibility they are non-spuriousness is reinforced by the fact that each of these lists includes other similar terms in the top 100.

Searches for “am I pregnant” and “ways to get pregnant,” by state (r=.96):


Searches for “divorce lawyer” and “maserati price,” by state (r=.83):


Searches for “discipline children” and “marriage problems,” by state (r=.89):


Searches for “office jobs” and “bob haircut,” by week (r=.88):


Searches for “vasectomy cost” and “how old is johnny depp,” by state (r=.87):


Searches for “man caves” and “penny from big bang theory,” by state (r=.85):

For my whole series of Google-related posts, follow the tag.



Education, not income, drives Piketty searches

Proving once again that effort is not always correlated with income, I present this critique of a Justin Wolfers blog post…

A lot of people have written reviews of Piketty. The first few pages of a Google search revealed all these (I added Heather Boushey, who wrote a good one)*:


I believe that is diversity, because every human being is different.

Anyway, where to begin? Justin Wolfers wrote a little post, not a review, but it caught my attention. The headline of was, “Piketty’s Book on Wealth and Inequality Is More Popular in Richer States.” Distractable, that’s where I began.

Wolfers’ culminating line, “Vive la révolution!”, suited Scott Winship, who looked over Wolfer’s figures before sniping, “the buzz around the book has come mostly from rich liberal states along the Boston-to-Washington corridor.” But I think they’re both misinterpreting.

According to the Google search data Wolfers used, these were the top 10 states for “piketty” searches (Washington, D.C. excluded): Massachusetts, New York, Connecticut, Maryland, New Jersey, Illinois, Pennsylvania, Wisconsin, Oregon, California.

It looks to me that it’s actually education driving the search data. And that is a big difference. Let me explain.

Do data?

Microsoft Word tells me that the reading grade level of the publisher’s excerpt is 16.3, so it takes a 16th-grade education to read it. (Note that the “Boston-to-Washington corridor,” which was supposed to sound like a small sliver of the country, has 26% of the country’s college graduates.) So consider income versus college completion, which we can now take as a proxy for being able to read Piketty.

Wolfers writes, “I can’t tell you where Piketty has been least popular, because below a certain level of search activity, Google doesn’t release the actual numbers.” So he proceeds to leave 24 states out of his analysis (this will become important). Using per-capita income (converted to z-scores), and dropping 24 states plus the ridiculous outlier of DC, this is Wolfers’ income result (my calculations; he just showed scatter plots):


OK, leaving out the bottom half of the Piketty distribution, there is a strong positive relationship between per capita income and Piketty Google searches. Congratulations, you can have three jobs as an economist!

I kid Wolfers. But, come on! I don’t know what kind of data operation they’re running over there at the Upshot, but I would expect Wolfers to take it up a notch. First, control for college completion (percent of folks ages 25+ with a BA or more, also z-scored). See how it shows… oops:


The income effect is reduced but the education effect isn’t significant. (See how I showed you that instead of just going right to the results that support my argument?)

But go back to Wolfers leaving out the bottom half of the Piketty distribution. What’s wrong with that? I’m sure there’s some statistical way of explaining that, but just eyeballing it you’d have to say dropping those cases could cause trouble. The censored cases all have values of -.64 on the search variable. The relationship with income is weaker when the censored cases are included (shown in the red line) versus when he limits it to the top half of Piketty states (blue line):


What to do about this? An easy thing is just to include the censored cases at their values of -.64, just pretending -.64 is a legitimate value. That gives:


Now the income effect is reduced about three-quarters, and the college completion effect is three-times as large (with a t-stats to match).

But that’s not the best way to handle this. If only economists had invented a way of modeling data with censored dependent variables! Just kidding: there’s Tobin’s Tobit. This kind of model says, I see your censored dependent variable, and I crash it through the bottom of the distribution as a function of its linear relationship to your independent variables. So instead of all being -.64, it lets the censored cases be as low as they want to be, with values predicted by income and college completion. Sort of. Anyway, here’s that result:


Now income is crushed, reduced to literal insignificance. What matters is the percentage of the population that has completed college. It’s not that rich people like Piketty, it’s that college graduates do. Maybe because that’s who can read it. (I don’t know, I haven’t tried.)

What do economists read?

Of course, mine and Wolfers’ are both pretty crude analyses. There are only two reasons his was published on a major news site and mine was buried over here on an obscure sociology blog: (a) he writes for a major news site, and (b) his weak analysis lends itself to an emerging snarky narrative in which rich leftists are seen to whine about inequality but real people can’t be bothered (the main point of Winship’s review) — just reinforcing the echo-chamber model of knowledge consumption that people who are into “data-driven” news like to appear to have risen above.

For a real explanation, Wolfers (and Winship) need look no further than the rest of the Google Correlate results page to see the obvious fact that searches for Piketty are simply correlated with interest in economics. Here’s the search that is most highly correlated with searches for “piketty” across U.S. states: “world bank gdp” (r=.98):


Here are some other searches correlated with “piketty” at .94 or higher:

economic consulting firms
eu data protection
exchange rate data
gdp by sector
inflation target
journal of labor economics
london school economics
nber working paper
oecd statistics
oxford economics
panel data stata
stock market capitalization
the economist intelligence unit
us current account deficit
world bank statistics

Well, there goes your rich, liberal, “American left” theory of who’s driving the Piketty phenomenon. It might be true, but it’s not confirmed by the Google search data. My hot new theory: college educated people who are also interested in economics are disproportionately interested in Piketty.

* The reviewer pool: Mervyn King (The Telegraph), Paul Krugman (New York Review of Books), Tyler Cowen (Foreign Affairs), James K. Galbraith (Dissent), Daniel Schuchman (Wall Street Journal), Justin Fox (Harvard Business Review), Michael Tanner (National Review), John Cassidy (New Yorker), Martin Wolf (Financial Times), Jordan Weissmann (Slate), Steven Pearlstein (Washington Post), Scott Winship (National Review), Heather Boushey (Challenge)


What do doctors, lawyers, police, and librarians Google?

Now with college teachers!

What do doctors, lawyers, police, and librarians Google? I’ll tell you. But first — if you are going to take this too seriously, please stop now.

Data and Method

Using IPUMS to extract data from the 2010-2012 American Community Survey, I count the number of people ages 25-64, currently employed, in a given occupation. I divide that by each state’s population in that age range (excluding Washington DC from all analyses). I enter those numbers into the Google Correlate tool to see which searches are most highly correlated with the distribution of each occupation across states (the tool reports the top 100 most correlated searches). In other words, these are searches that maximize the difference between, for example, high-lawyer and low-lawyer states — searches that are relatively popular where there are a lot of lawyers, and relatively unpopular where there are not a lot of lawyers.

Is this what lawyers actually Google? We can’t know. But I think so. Or maybe what people who work in law firms do, or people who live with lawyers. It’s a very sensitive tool. I made this case first in the post, Stuff White People Google. Check that out if you’re skeptical.

For each occupation, I first offer a few highly correlated searches that support the idea that the data are capturing what these people search for. Then I list some of the interesting other hits from each list.



Police per adult

Police per adult

The map of police per adult looks pretty random, but the list of correlated search terms doesn’t. On the list are “security training,” “tsa jobs,” “waist belt,” “weight vest,” and “air marshals.”

After all the security stuff, the only major category left in the 100 searches most correlated with police in the population is women. Specifically, their search taste includes tough actress Rachel Ticotin, body builder Denise Masino, Brazilian actress Alice Braga, actress Rosario Dawson, and, “israeli women.” (Remember, Google suppresses known porn terms, so this is just what got through the filter.) It’s a leap from this data to the statement, “police search for images of these women,” but this is who they would find if that were the case (is this a “type”?):



Librarians per adult

Librarians per adult

On the other hand, librarians. They are the smallest occupation I tried: the average state population aged 25-64 is only one tenth of one percent librarians. Yet, their distribution leaves an unmistakable trace in the Google search patterns. It especially seems to pick up terms associated with public libraries. Correlated terms include, “cataloguing,” and “quiet hours.” And then there are terms one might ask a librarian about, classic reference-desk questions such as, “which vs that,” “turn off track changes,” “think tanks,” “9/11 commission,” and “irs form 6251″; and term paper topics like Shakespeare titles or “human development report.”

What about the librarians themselves, or those close to them? Could it be they who are searching for Ann Taylor dresses, Garnet Hill free shipping, Lands End home, and textile museums? We can’t know for sure. Of course, if anyone knows how to cover their search tracks, it might be this crowd.


Doctors per adult

Doctors per adult

You know they’re doctors, because the search terms most correlated the map include “md, mph,” “md, phd,” “nejm,” “journal medicine,” “tedmed,” and “groopman.” What else do they like? Chic Corea, Tina Fey, Larry David, Mad Men (season 1) and The West Wing, Laura Linney, John Oliver, Scrabble 2-letter words, and a bunch of Jewish stuff.


Lawyers per adult

Lawyers per adult

That’s the map of lawyers per adult across states. Is it really lawyers? The top 100 searches correlated with the distribution shown above include “general counsel,” and then a lot of financial terms like, “world economic forum,” “international finance corporation,” and “economist intelligence.” Then there are international travel terms, like, “rate euro dollar,” “royal air,” and “swiss embassy.”

Looks like lawyers in lawyer-land are richer and more finance-oriented than lawyers in general. On the cultural side, they search for clothing terms Massimo Dutti, Hugo Boss, and Benetton. They apparently like to eat at Zafferano in London, and drink Caipirinhas. Also, they like “vissi,” which is an aria from Tosca but also a Cypriot celebrity; I lean toward the latter, because Queen Rania is also on the list. Finally, they combine their interests in law, finance, and wealthy attractive women by searching for Debrahlee Lorenzana, the “too-hot-for-work” banker.

By popular demand: Post-secondary teachers


Finally, here without comment are the results for “post-secondary teachers,” which includes any college teacher who didn’t instead specify a specialty, such as “psychologist” or “economist.” (It’s hard to see on the map, but Rhode Island is the highest.) I broke the results into four rough categories:


bmi index
body image
citation style
critical theory
debt to equity
debt to equity ratio
democracy in america
economic inequality
economic statistics
edward elgar
effect size
email forward
equals sign
google scholar
growth rates
inflation rate
inflation rates
international study
journal of
journal of nutrition
marginal propensity
marginal propensity to consume
meters per second
piano sonata
prefrontal cortex
profile of
psychology studies
quick ratio
rejection letter
returns to scale
ways to end a letter


1% milk
2006 olympics
best pump up songs
crib safety
easy halloween costume
graco snug
ipod history
jackson superbowl
janet jackson superbowl
mastermind game
maxim online
most popular names
national sleep foundation
olympic figure skating
olympics 2006
pairs figure skating
sandra boynton
senior hockey
snl clips
stuff magazine
stumbled upon
toilet training


1812 overture
acapella group
acapella groups
africa toto
ave verum
for the longest time
it breaks my heart
pdq bach
taylor swift

Birth control

apri birth control


Poor social scientists, generations of them spending their lives raising a few thousand dollars to ask a few thousand people a few hundred stilted, arbitrary survey questions. Meanwhile, coursing through the cable wires below their feet, and through the air around them, billions of data bits carry so much more potential information about so many more people, in so many intimate aspects of their lives, then we could even dream of getting our hands on. Just think of the power!

RingfrodoNote: I’ve done many posts like this. Some use time series instead of geographic variation, some use terms from Google Books ngrams. Browse the series under the Google tag, or check out this selection:




Who’s worried about abstinence?

Probing the deep structure of the collective psyche, or just noise? Either way, kind of interesting.

Are people Googling “abstinence” worried more generally about children’s behavior — maybe their own children’s behavior? Compare the pattern across states in Google searches for “abstinence” and “b. f. skinner” (correlation .79)*:


Searches for “abstinence” (left) and “b. f. skinner” (right)

Out of the top 100 most-correlated-with-“abstinence” searches, these are the others that plausibly have to do with children’s behavior (correlated between .79 and 87):

attention deficit disorder
attention deficit hyperactivity
attention deficit hyperactivity disorder
b.f. skinner
behavior problems
girls basketball team
hyperactivity disorder
student motivation

My mental image here is one of parental desperation, a parent who one day is thinking of how to get her daughter onto the girls’ basketball team and Googling “student motivation,” and the next day is back to “punishment.”

Two other things about the Abstinence Searchers. One is they may be health worriers generally, and/or have health problems (or live in communities with these problems), because these are also in the top 100:

bowel syndrome
cancer facts
coping with
coping with stress
effects of drinking
eye disorders
gastric ulcers
heart attacks
heart disease
infant death
infant death syndrome
irritable bowel syndrome
muscular dystrophy
reflux disease
sleeping disorders
sudden infant
sudden infant death
sudden infant death syndrome

This second list makes me more sympathetic to the Abstinence Searchers. On the other hand, it looks like there is a lot of homeschooling going on here as well (the correlation of “abstinence” with “homeschooling” is .54, not in the top 100 but pretty good). These are also in the top 100:

activities for
activities for preschool
activities for preschoolers
activities for students
classroom activities
classroom activity
educational activities
list of famous people
list of the 50 states
projects for students
pronunciation of
textbook publishers
word games

I am not in favor of abstinence education because it doesn’t serve children well, and I like the idea of children taught complete information by trained professionals. I would never draw conclusions from this kind of superficial analysis, but it’s a little depressing.

* Note, perhaps due to an outbreak of abstinence education in Mississippi, the number of searches there was an outlier, so I top-coded Mississippi at just over the level of the next-highest state, South Dakota.


Here it is, your moment of White

A couple years ago, in a post called “Stuff White People Google,” I showed which Google search patterns were most highly correlated with the representation of different race/ethnic groups in the Census. That was a much better post than this.

This is a moment-of-White followup.

Here are Whites, by county, from this tool:


Here are the searches for “back in black,” from Google Correlate:


Google searches for “back in black”

And here is the correlation between searches for “back in black” and searches for “kitten pictures,” by state:


The scales are normed to a mean of 0 and standard deviation of 1 by Google, I think. I made the graph in Stata with this command (which I’m putting here because I always forget this syntax):

gr twoway scatter backinblack kittenpictures, mlabel(state) mlabposition(0) msymbol(i)

Random question

So, if it is Whites doing the searching for “back in black” and “kitten pictures,” is it possible that the searches are going on in the same households with some kind of gender division?


Don’t let that selectively-chosen picture fool you. According to the Alexa web traffic site, visitors to acdc.com skew only slightly male. And Facebook tells me I can reach a mostly- but not overwhelmingly-male mix of 3 million women versus 4 million men if I target people with an interest in AC/DC for an ad. (However, if people Googling AC/DC are looking for guitar tabs, maybe it’s the intersection of guitar and AC/DC as interests that matter.)

On the other hand, cuteoverload.com, which is loaded with kitten pictures, skews strongly female, and Facebook tells me that “cat pictures” as an interest will attract women more than men at a ratio of 4-to-1 (much more skewed  than the general interest in cats: 1.5-to-1).

Anyway, this might not be the best case. I wonder what other examples there might be of a specific group (e.g., Whites) being divided between men who have a uniquely strong interest in something (AC/DC) and women who have a uniquely strong interest in something else (kitten pictures), with low overlap between the genders. That would be neat – intersectionality seen in Google search patterns.


Anyway, it’s time for another year of graduate student admissions. If you or someone you know like playing with data and making graphs, pursuing hunches about social patterns (more or less important than the ones here), and reading and writing a lot, maybe you or your friend should be in next year’s pile of applications.


What’s been queered?

How much has the term, and concept, of queer penetrated the discourse of sexuality, politics and identity?

In the overall use of the terms queer sexualityqueer politics, and queer identity, according to the Google ngrams database of American English usage, queer politics occurs most often, and queer sexuality is last.

queer-useSource: Google ngrams.

On the other hand, as a fraction of references to politics, identity, and sexuality respectively — what you could call the relative penetration of queer — the order is different: queer sexuality has most successfully entered the discourse on sexuality, with queer politics and queer identity quite behind in their relative niches:


Source: Google ngrams.

(In all of these I used both capitalized and un-capitalized versions. Follow the links to modify the codes yourself.)


