Tag Archives: google

6 family correlations that will blow your mind or break your heart, and that probably aren’t spurious

The title is supposed to be funny.

Some data trends and patterns are correlated just by chance, such as the trends in the high fructose corn syrup consumption and the Florida divorce rate. But there are other correlations that, although seeming highly improbable, and you might never have predicted them, are not actually spurious. For finding those, there is Google Correlate. Out of the billions of possible correlations with the first term in these pairs – either across states or over time – each of these was in the top 100. The possibility they are non-spuriousness is reinforced by the fact that each of these lists includes other similar terms in the top 100.

Searches for “am I pregnant” and “ways to get pregnant,” by state (r=.96):

amipregnant


Searches for “divorce lawyer” and “maserati price,” by state (r=.83):

divorcelawyer


Searches for “discipline children” and “marriage problems,” by state (r=.89):

disciplinechildren


Searches for “office jobs” and “bob haircut,” by week (r=.88):

officejobs


Searches for “vasectomy cost” and “how old is johnny depp,” by state (r=.87):

vasectomycost


Searches for “man caves” and “penny from big bang theory,” by state (r=.85):

mancavesFor my whole series of Google-related posts, follow the tag.

 

2 Comments

Filed under Uncategorized

Education, not income, drives Piketty searches

Proving once again that effort is not always correlated with income, I present this critique of a Justin Wolfers blog post…

A lot of people have written reviews of Piketty. The first few pages of a Google search revealed all these (I added Heather Boushey, who wrote a good one)*:

piketty-reviewers

I believe that is diversity, because every human being is different.

Anyway, where to begin? Justin Wolfers wrote a little post, not a review, but it caught my attention. The headline of was, “Piketty’s Book on Wealth and Inequality Is More Popular in Richer States.” Distractable, that’s where I began.

Wolfers’ culminating line, “Vive la révolution!”, suited Scott Winship, who looked over Wolfer’s figures before sniping, “the buzz around the book has come mostly from rich liberal states along the Boston-to-Washington corridor.” But I think they’re both misinterpreting.

According to the Google search data Wolfers used, these were the top 10 states for “piketty” searches (Washington, D.C. excluded): Massachusetts, New York, Connecticut, Maryland, New Jersey, Illinois, Pennsylvania, Wisconsin, Oregon, California.

It looks to me that it’s actually education driving the search data. And that is a big difference. Let me explain.

Do data?

Microsoft Word tells me that the reading grade level of the publisher’s excerpt is 16.3, so it takes a 16th-grade education to read it. (Note that the “Boston-to-Washington corridor,” which was supposed to sound like a small sliver of the country, has 26% of the country’s college graduates.) So consider income versus college completion, which we can now take as a proxy for being able to read Piketty.

Wolfers writes, “I can’t tell you where Piketty has been least popular, because below a certain level of search activity, Google doesn’t release the actual numbers.” So he proceeds to leave 24 states out of his analysis (this will become important). Using per-capita income (converted to z-scores), and dropping 24 states plus the ridiculous outlier of DC, this is Wolfers’ income result (my calculations; he just showed scatter plots):

pik1

OK, leaving out the bottom half of the Piketty distribution, there is a strong positive relationship between per capita income and Piketty Google searches. Congratulations, you can have three jobs as an economist!

I kid Wolfers. But, come on! I don’t know what kind of data operation they’re running over there at the Upshot, but I would expect Wolfers to take it up a notch. First, control for college completion (percent of folks ages 25+ with a BA or more, also z-scored). See how it shows… oops:

pik2

The income effect is reduced but the education effect isn’t significant. (See how I showed you that instead of just going right to the results that support my argument?)

But go back to Wolfers leaving out the bottom half of the Piketty distribution. What’s wrong with that? I’m sure there’s some statistical way of explaining that, but just eyeballing it you’d have to say dropping those cases could cause trouble. The censored cases all have values of -.64 on the search variable. The relationship with income is weaker when the censored cases are included (shown in the red line) versus when he limits it to the top half of Piketty states (blue line):

pik-scatter1

What to do about this? An easy thing is just to include the censored cases at their values of -.64, just pretending -.64 is a legitimate value. That gives:

pik3

Now the income effect is reduced about three-quarters, and the college completion effect is three-times as large (with a t-stats to match).

But that’s not the best way to handle this. If only economists had invented a way of modeling data with censored dependent variables! Just kidding: there’s Tobin’s Tobit. This kind of model says, I see your censored dependent variable, and I crash it through the bottom of the distribution as a function of its linear relationship to your independent variables. So instead of all being -.64, it lets the censored cases be as low as they want to be, with values predicted by income and college completion. Sort of. Anyway, here’s that result:

pik4

Now income is crushed, reduced to literal insignificance. What matters is the percentage of the population that has completed college. It’s not that rich people like Piketty, it’s that college graduates do. Maybe because that’s who can read it. (I don’t know, I haven’t tried.)

What do economists read?

Of course, mine and Wolfers’ are both pretty crude analyses. There are only two reasons his was published on a major news site and mine was buried over here on an obscure sociology blog: (a) he writes for a major news site, and (b) his weak analysis lends itself to an emerging snarky narrative in which rich leftists are seen to whine about inequality but real people can’t be bothered (the main point of Winship’s review) — just reinforcing the echo-chamber model of knowledge consumption that people who are into “data-driven” news like to appear to have risen above.

For a real explanation, Wolfers (and Winship) need look no further than the rest of the Google Correlate results page to see the obvious fact that searches for Piketty are simply correlated with interest in economics. Here’s the search that is most highly correlated with searches for “piketty” across U.S. states: “world bank gdp” (r=.98):

pik-scatter2

Here are some other searches correlated with “piketty” at .94 or higher:

economic consulting firms
eu data protection
exchange rate data
gdp by sector
inflation target
journal of labor economics
london school economics
nber working paper
oecd statistics
oxford economics
panel data stata
stock market capitalization
the economist intelligence unit
us current account deficit
world bank statistics

Well, there goes your rich, liberal, “American left” theory of who’s driving the Piketty phenomenon. It might be true, but it’s not confirmed by the Google search data. My hot new theory: college educated people who are also interested in economics are disproportionately interested in Piketty.

* The reviewer pool: Mervyn King (The Telegraph), Paul Krugman (New York Review of Books), Tyler Cowen (Foreign Affairs), James K. Galbraith (Dissent), Daniel Schuchman (Wall Street Journal), Justin Fox (Harvard Business Review), Michael Tanner (National Review), John Cassidy (New Yorker), Martin Wolf (Financial Times), Jordan Weissmann (Slate), Steven Pearlstein (Washington Post), Scott Winship (National Review), Heather Boushey (Challenge)

3 Comments

Filed under Uncategorized

What do doctors, lawyers, police, and librarians Google?

Now with college teachers!

What do doctors, lawyers, police, and librarians Google? I’ll tell you. But first — if you are going to take this too seriously, please stop now.

Data and Method

Using IPUMS to extract data from the 2010-2012 American Community Survey, I count the number of people ages 25-64, currently employed, in a given occupation. I divide that by each state’s population in that age range (excluding Washington DC from all analyses). I enter those numbers into the Google Correlate tool to see which searches are most highly correlated with the distribution of each occupation across states (the tool reports the top 100 most correlated searches). In other words, these are searches that maximize the difference between, for example, high-lawyer and low-lawyer states — searches that are relatively popular where there are a lot of lawyers, and relatively unpopular where there are not a lot of lawyers.

Is this what lawyers actually Google? We can’t know. But I think so. Or maybe what people who work in law firms do, or people who live with lawyers. It’s a very sensitive tool. I made this case first in the post, Stuff White People Google. Check that out if you’re skeptical.

For each occupation, I first offer a few highly correlated searches that support the idea that the data are capturing what these people search for. Then I list some of the interesting other hits from each list.

Results

Police

Police per adult

Police per adult

The map of police per adult looks pretty random, but the list of correlated search terms doesn’t. On the list are “security training,” “tsa jobs,” “waist belt,” “weight vest,” and “air marshals.”

After all the security stuff, the only major category left in the 100 searches most correlated with police in the population is women. Specifically, their search taste includes tough actress Rachel Ticotin, body builder Denise Masino, Brazilian actress Alice Braga, actress Rosario Dawson, and, “israeli women.” (Remember, Google suppresses known porn terms, so this is just what got through the filter.) It’s a leap from this data to the statement, “police search for images of these women,” but this is who they would find if that were the case (is this a “type”?):

policewomensearches

Librarians

Librarians per adult

Librarians per adult

On the other hand, librarians. They are the smallest occupation I tried: the average state population aged 25-64 is only one tenth of one percent librarians. Yet, their distribution leaves an unmistakable trace in the Google search patterns. It especially seems to pick up terms associated with public libraries. Correlated terms include, “cataloguing,” and “quiet hours.” And then there are terms one might ask a librarian about, classic reference-desk questions such as, “which vs that,” “turn off track changes,” “think tanks,” “9/11 commission,” and “irs form 6251″; and term paper topics like Shakespeare titles or “human development report.”

What about the librarians themselves, or those close to them? Could it be they who are searching for Ann Taylor dresses, Garnet Hill free shipping, Lands End home, and textile museums? We can’t know for sure. Of course, if anyone knows how to cover their search tracks, it might be this crowd.

Doctors

Doctors per adult

Doctors per adult

You know they’re doctors, because the search terms most correlated the map include “md, mph,” “md, phd,” “nejm,” “journal medicine,” “tedmed,” and “groopman.” What else do they like? Chic Corea, Tina Fey, Larry David, Mad Men (season 1) and The West Wing, Laura Linney, John Oliver, Scrabble 2-letter words, and a bunch of Jewish stuff.

Lawyers

Lawyers per adult

Lawyers per adult

That’s the map of lawyers per adult across states. Is it really lawyers? The top 100 searches correlated with the distribution shown above include “general counsel,” and then a lot of financial terms like, “world economic forum,” “international finance corporation,” and “economist intelligence.” Then there are international travel terms, like, “rate euro dollar,” “royal air,” and “swiss embassy.”

Looks like lawyers in lawyer-land are richer and more finance-oriented than lawyers in general. On the cultural side, they search for clothing terms Massimo Dutti, Hugo Boss, and Benetton. They apparently like to eat at Zafferano in London, and drink Caipirinhas. Also, they like “vissi,” which is an aria from Tosca but also a Cypriot celebrity; I lean toward the latter, because Queen Rania is also on the list. Finally, they combine their interests in law, finance, and wealthy attractive women by searching for Debrahlee Lorenzana, the “too-hot-for-work” banker.

By popular demand: Post-secondary teachers

postsecondaryperadult

Finally, here without comment are the results for “post-secondary teachers,” which includes any college teacher who didn’t instead specify a specialty, such as “psychologist” or “economist.” (It’s hard to see on the map, but Rhode Island is the highest.) I broke the results into four rough categories:

Academic

attribution
balderdash
bmi index
body image
citation style
cpdl
critical theory
debt to equity
debt to equity ratio
democracy in america
dihedral
economic inequality
economic statistics
economists
educause
edward elgar
effect size
email forward
equals sign
exogenous
feminists
google scholar
growth rates
homomorphism
inflation rate
inflation rates
intelligibility
international study
isomorphic
journal of
journal of nutrition
marginal propensity
marginal propensity to consume
mediating
meters per second
milieu
overlaying
piano sonata
prefrontal
prefrontal cortex
profile of
psychology studies
quick ratio
rejection letter
returns to scale
routledge
scholar
subgroup
superscript
transglutaminase
ways to end a letter

Personal

1% milk
2006 olympics
best pump up songs
crib safety
easy halloween costume
graco snug
handel
ipod history
jackson superbowl
janet jackson superbowl
mastermind game
maxim online
minesweeper
most popular names
napping
national sleep foundation
olympic figure skating
olympics 2006
pairs figure skating
positioning
refereeing
sandra boynton
senior hockey
snl clips
stuff magazine
stumbled upon
toilet training
verum

Musical

1812 overture
acapella group
acapella groups
africa toto
ave verum
for the longest time
it breaks my heart
pdq bach
taylor swift

Birth control

apri
apri birth control
aviane

Conclusions

Poor social scientists, generations of them spending their lives raising a few thousand dollars to ask a few thousand people a few hundred stilted, arbitrary survey questions. Meanwhile, coursing through the cable wires below their feet, and through the air around them, billions of data bits carry so much more potential information about so many more people, in so many intimate aspects of their lives, then we could even dream of getting our hands on. Just think of the power!

RingfrodoNote: I’ve done many posts like this. Some use time series instead of geographic variation, some use terms from Google Books ngrams. Browse the series under the Google tag, or check out this selection:

 

 

2 Comments

Filed under Me @ work

Who’s worried about abstinence?

Probing the deep structure of the collective psyche, or just noise? Either way, kind of interesting.

Are people Googling “abstinence” worried more generally about children’s behavior — maybe their own children’s behavior? Compare the pattern across states in Google searches for “abstinence” and “b. f. skinner” (correlation .79)*:

homeskinner

Searches for “abstinence” (left) and “b. f. skinner” (right)

Out of the top 100 most-correlated-with-“abstinence” searches, these are the others that plausibly have to do with children’s behavior (correlated between .79 and 87):

attention deficit disorder
attention deficit hyperactivity
attention deficit hyperactivity disorder
b.f.
b.f. skinner
behaviors
behavior problems
girls basketball team
hyperactivity
hyperactivity disorder
pregnancies
punishment
student motivation

My mental image here is one of parental desperation, a parent who one day is thinking of how to get her daughter onto the girls’ basketball team and Googling “student motivation,” and the next day is back to “punishment.”

Two other things about the Abstinence Searchers. One is they may be health worriers generally, and/or have health problems (or live in communities with these problems), because these are also in the top 100:

bowel syndrome
cancer facts
coping with
coping with stress
diseases
disorders
effects of drinking
eye disorders
gastric ulcers
heart attacks
heart disease
infant death
infant death syndrome
irritable bowel syndrome
muscular dystrophy
obesity
phenylketonuria
reflux disease
sleeping disorders
sudden infant
sudden infant death
sudden infant death syndrome

This second list makes me more sympathetic to the Abstinence Searchers. On the other hand, it looks like there is a lot of homeschooling going on here as well (the correlation of “abstinence” with “homeschooling” is .54, not in the top 100 but pretty good). These are also in the top 100:

activities for
activities for preschool
activities for preschoolers
activities for students
classroom activities
classroom activity
educational activities
list of famous people
list of the 50 states
projects for students
pronunciation of
pronunciations
textbook publishers
topics
well-known
word games
http://www.census.gov

I am not in favor of abstinence education because it doesn’t serve children well, and I like the idea of children taught complete information by trained professionals. I would never draw conclusions from this kind of superficial analysis, but it’s a little depressing.

* Note, perhaps due to an outbreak of abstinence education in Mississippi, the number of searches there was an outlier, so I top-coded Mississippi at just over the level of the next-highest state, South Dakota.

4 Comments

Filed under Uncategorized

Here it is, your moment of White

A couple years ago, in a post called “Stuff White People Google,” I showed which Google search patterns were most highly correlated with the representation of different race/ethnic groups in the Census. That was a much better post than this.

This is a moment-of-White followup.

Here are Whites, by county, from this tool:

PercentWhite

Here are the searches for “back in black,” from Google Correlate:

backinblack

Google searches for “back in black”

And here is the correlation between searches for “back in black” and searches for “kitten pictures,” by state:

backinblackkittens

The scales are normed to a mean of 0 and standard deviation of 1 by Google, I think. I made the graph in Stata with this command (which I’m putting here because I always forget this syntax):

gr twoway scatter backinblack kittenpictures, mlabel(state) mlabposition(0) msymbol(i)

Random question

So, if it is Whites doing the searching for “back in black” and “kitten pictures,” is it possible that the searches are going on in the same households with some kind of gender division?

acdcfans

Don’t let that selectively-chosen picture fool you. According to the Alexa web traffic site, visitors to acdc.com skew only slightly male. And Facebook tells me I can reach a mostly- but not overwhelmingly-male mix of 3 million women versus 4 million men if I target people with an interest in AC/DC for an ad. (However, if people Googling AC/DC are looking for guitar tabs, maybe it’s the intersection of guitar and AC/DC as interests that matter.)

On the other hand, cuteoverload.com, which is loaded with kitten pictures, skews strongly female, and Facebook tells me that “cat pictures” as an interest will attract women more than men at a ratio of 4-to-1 (much more skewed  than the general interest in cats: 1.5-to-1).

Anyway, this might not be the best case. I wonder what other examples there might be of a specific group (e.g., Whites) being divided between men who have a uniquely strong interest in something (AC/DC) and women who have a uniquely strong interest in something else (kitten pictures), with low overlap between the genders. That would be neat – intersectionality seen in Google search patterns.

So

Anyway, it’s time for another year of graduate student admissions. If you or someone you know like playing with data and making graphs, pursuing hunches about social patterns (more or less important than the ones here), and reading and writing a lot, maybe you or your friend should be in next year’s pile of applications.

3 Comments

Filed under In the news

What’s been queered?

How much has the term, and concept, of queer penetrated the discourse of sexuality, politics and identity?

In the overall use of the terms queer sexualityqueer politics, and queer identity, according to the Google ngrams database of American English usage, queer politics occurs most often, and queer sexuality is last.

queer-useSource: Google ngrams.

On the other hand, as a fraction of references to politics, identity, and sexuality respectively — what you could call the relative penetration of queer — the order is different: queer sexuality has most successfully entered the discourse on sexuality, with queer politics and queer identity quite behind in their relative niches:

queer-penetration

Source: Google ngrams.

(In all of these I used both capitalized and un-capitalized versions. Follow the links to modify the codes yourself.)

4 Comments

Filed under In the news

Family Inequality marriage forecast contest

Enter to win: How many people will get married last year?

marriage-forecase-cartoon

An outfit called Demographic Intelligence, which I’ve written about before, got USA Today to do a story on their new U.S. Wedding Forecast™.

Although there’s no sign of him on the website anymore, DI was founded by W. Bradford Wilcox, according to the Wayback Machine‘s archive. Now it is reportedly run by Samuel Sturgeon, who did a little work for the Heritage Foundation while working on his PhD on welfare and abortion policy with David Eggebeen at Penn State (who joined Mark Regnerus in a Supreme Court brief opposing marriage equality).

Anyway, the wedding forecast is available for sale only, and the formula is a secret (the ™ is for “trust me”). But they leaked some details to USA Today.

The company projects a 4% increase in the number of weddings since 2009, reaching 2.168 million this year; 2.189 million in 2014. Depending on the economic recovery, the report projects a continuing increase to 2.208 million in 2015. … From 2007 to 2009, the number of marriages each year fell from 2.197 million to 2.080 million. The report estimates that more than 175,000 weddings have been postponed or foregone since the recession began.

OK, so the projection for 2013 is 2.168 million. The story doesn’t say what DI forecasts for 2012 — which has already happened, although the official number hasn’t been released. But we should be cautious before buying wedding futures, because, according to USA Today: “This is the company’s first foray into wedding forecasts.”

Do it yourself™

I’ve made a forecast, but like DI-LLC™, I’m sealing it till the end of the contest. But I’ll give you a few data points so you can enter your predicted number of marriages in the U.S. in 2012. The person whose prediction, posted in the comments, is closest to the actual number reported on this page will win a free Family Inequality t-shirt, if I ever get around to making them. In the event of a tie, the prediction posted earliest wins.

Here is the trend from 2000 to 2011. Observe: long-run decline, recession-spike down, then rebound.

marriage-trend

So, is the rebound just a little catching up from delayed marriage, or what? That’s the question. DI says 2,168,000 by 2013. Go marriage!

The USA Today story reports that DI’s forecast is “based on a variety of measures, including unemployment and consumer confidence.” I got some of that for you. I also added the number of women in the US ages 20-39 (who account for about 75% of marriages); these children of the Baby Boomers are a producing a little population bulge which could bring more marriages even at falling rates.

In what could be bad news for the DI forecast, however, I also checked the Google search trends for “wedding invitations,” “bridal shower,” and “wedding gifts.” These are the trends, shown in 3-week moving averages, with each normed so that 100% was the most popular week (the originals are here). See the big rebound continuing in 2012? Me neither. Click to enlarge:

google-marriage-trendsI annualized those numbers for each year 2004 to 2013, with a seasonal adjustment for the first 24 weeks of the year (don’t ask).

But are the Google numbers good for prediction? I used them to predict another down year for marriage in 2011. That wasn’t born out by the vital statistics numbers, which rebounded (as shown in the chart above). On the other hand, the numbers from the American Community Survey showed the decline continuing in to 2011, as reported by Pew. On the third hand, ACS shows continued decline in marriage rates (the difference is in the number of marriage-aged people). We don’t know enough about the difference between ACS and vital statistics to interpret this yet. Uncharted waters.

So, here are your numbers, with everything up to 2013 except the outcome: the number of marriages. Feel free to use these or anything else you like. Or just guess. Remember, Demographic Intelligence boasts of 99% accuracy, but except for 2009 you would have been at 97.5% or better just guessing no change — so you’re bound to be close. The contest is for 2012, but 2013 forecasts are welcome, too — better early than never. Click to enlarge:

marriage-forecast-data

To make it easier, I’ve uploaded the spreadsheet, with sources, here.

In marketing terminology, these variables are very hot leads. Here are the correlations between each variable and the number of marriages, for the years 2004-2011:

marriage-correlations

For other posts about prediction, see:

Good luck!

3 Comments

Filed under Uncategorized