Tag Archives: google

6 family correlations that will blow your mind or break your heart, and that probably aren’t spurious

The title is supposed to be funny.

Some data trends and patterns are correlated just by chance, such as the trends in the high fructose corn syrup consumption and the Florida divorce rate. But there are other correlations that, although seeming highly improbable, and you might never have predicted them, are not actually spurious. For finding those, there is Google Correlate. Out of the billions of possible correlations with the first term in these pairs - either across states or over time – each of these was in the top 100. The possibility they are non-spuriousness is reinforced by the fact that each of these lists includes other similar terms in the top 100.

Searches for “am I pregnant” and “ways to get pregnant,” by state (r=.96):

amipregnant


Searches for “divorce lawyer” and “maserati price,” by state (r=.83):

divorcelawyer


Searches for “discipline children” and “marriage problems,” by state (r=.89):

disciplinechildren


Searches for “office jobs” and “bob haircut,” by week (r=.88):

officejobs


Searches for “vasectomy cost” and “how old is johnny depp,” by state (r=.87):

vasectomycost


Searches for “man caves” and “penny from big bang theory,” by state (r=.85):

mancavesFor my whole series of Google-related posts, follow the tag.

 

2 Comments

Filed under Uncategorized

Education, not income, drives Piketty searches

Proving once again that effort is not always correlated with income, I present this critique of a Justin Wolfers blog post…

A lot of people have written reviews of Piketty. The first few pages of a Google search revealed all these (I added Heather Boushey, who wrote a good one)*:

piketty-reviewers

I believe that is diversity, because every human being is different.

Anyway, where to begin? Justin Wolfers wrote a little post, not a review, but it caught my attention. The headline of was, “Piketty’s Book on Wealth and Inequality Is More Popular in Richer States.” Distractable, that’s where I began.

Wolfers’ culminating line, “Vive la révolution!”, suited Scott Winship, who looked over Wolfer’s figures before sniping, “the buzz around the book has come mostly from rich liberal states along the Boston-to-Washington corridor.” But I think they’re both misinterpreting.

According to the Google search data Wolfers used, these were the top 10 states for “piketty” searches (Washington, D.C. excluded): Massachusetts, New York, Connecticut, Maryland, New Jersey, Illinois, Pennsylvania, Wisconsin, Oregon, California.

It looks to me that it’s actually education driving the search data. And that is a big difference. Let me explain.

Do data?

Microsoft Word tells me that the reading grade level of the publisher’s excerpt is 16.3, so it takes a 16th-grade education to read it. (Note that the “Boston-to-Washington corridor,” which was supposed to sound like a small sliver of the country, has 26% of the country’s college graduates.) So consider income versus college completion, which we can now take as a proxy for being able to read Piketty.

Wolfers writes, “I can’t tell you where Piketty has been least popular, because below a certain level of search activity, Google doesn’t release the actual numbers.” So he proceeds to leave 24 states out of his analysis (this will become important). Using per-capita income (converted to z-scores), and dropping 24 states plus the ridiculous outlier of DC, this is Wolfers’ income result (my calculations; he just showed scatter plots):

pik1

OK, leaving out the bottom half of the Piketty distribution, there is a strong positive relationship between per capita income and Piketty Google searches. Congratulations, you can have three jobs as an economist!

I kid Wolfers. But, come on! I don’t know what kind of data operation they’re running over there at the Upshot, but I would expect Wolfers to take it up a notch. First, control for college completion (percent of folks ages 25+ with a BA or more, also z-scored). See how it shows… oops:

pik2

The income effect is reduced but the education effect isn’t significant. (See how I showed you that instead of just going right to the results that support my argument?)

But go back to Wolfers leaving out the bottom half of the Piketty distribution. What’s wrong with that? I’m sure there’s some statistical way of explaining that, but just eyeballing it you’d have to say dropping those cases could cause trouble. The censored cases all have values of -.64 on the search variable. The relationship with income is weaker when the censored cases are included (shown in the red line) versus when he limits it to the top half of Piketty states (blue line):

pik-scatter1

What to do about this? An easy thing is just to include the censored cases at their values of -.64, just pretending -.64 is a legitimate value. That gives:

pik3

Now the income effect is reduced about three-quarters, and the college completion effect is three-times as large (with a t-stats to match).

But that’s not the best way to handle this. If only economists had invented a way of modeling data with censored dependent variables! Just kidding: there’s Tobin’s Tobit. This kind of model says, I see your censored dependent variable, and I crash it through the bottom of the distribution as a function of its linear relationship to your independent variables. So instead of all being -.64, it lets the censored cases be as low as they want to be, with values predicted by income and college completion. Sort of. Anyway, here’s that result:

pik4

Now income is crushed, reduced to literal insignificance. What matters is the percentage of the population that has completed college. It’s not that rich people like Piketty, it’s that college graduates do. Maybe because that’s who can read it. (I don’t know, I haven’t tried.)

What do economists read?

Of course, mine and Wolfers’ are both pretty crude analyses. There are only two reasons his was published on a major news site and mine was buried over here on an obscure sociology blog: (a) he writes for a major news site, and (b) his weak analysis lends itself to an emerging snarky narrative in which rich leftists are seen to whine about inequality but real people can’t be bothered (the main point of Winship’s review) — just reinforcing the echo-chamber model of knowledge consumption that people who are into “data-driven” news like to appear to have risen above.

For a real explanation, Wolfers (and Winship) need look no further than the rest of the Google Correlate results page to see the obvious fact that searches for Piketty are simply correlated with interest in economics. Here’s the search that is most highly correlated with searches for “piketty” across U.S. states: “world bank gdp” (r=.98):

pik-scatter2

Here are some other searches correlated with “piketty” at .94 or higher:

economic consulting firms
eu data protection
exchange rate data
gdp by sector
inflation target
journal of labor economics
london school economics
nber working paper
oecd statistics
oxford economics
panel data stata
stock market capitalization
the economist intelligence unit
us current account deficit
world bank statistics

Well, there goes your rich, liberal, “American left” theory of who’s driving the Piketty phenomenon. It might be true, but it’s not confirmed by the Google search data. My hot new theory: college educated people who are also interested in economics are disproportionately interested in Piketty.

* The reviewer pool: Mervyn King (The Telegraph), Paul Krugman (New York Review of Books), Tyler Cowen (Foreign Affairs), James K. Galbraith (Dissent), Daniel Schuchman (Wall Street Journal), Justin Fox (Harvard Business Review), Michael Tanner (National Review), John Cassidy (New Yorker), Martin Wolf (Financial Times), Jordan Weissmann (Slate), Steven Pearlstein (Washington Post), Scott Winship (National Review), Heather Boushey (Challenge)

3 Comments

Filed under Uncategorized

What do doctors, lawyers, police, and librarians Google?

Now with college teachers!

What do doctors, lawyers, police, and librarians Google? I’ll tell you. But first — if you are going to take this too seriously, please stop now.

Data and Method

Using IPUMS to extract data from the 2010-2012 American Community Survey, I count the number of people ages 25-64, currently employed, in a given occupation. I divide that by each state’s population in that age range (excluding Washington DC from all analyses). I enter those numbers into the Google Correlate tool to see which searches are most highly correlated with the distribution of each occupation across states (the tool reports the top 100 most correlated searches). In other words, these are searches that maximize the difference between, for example, high-lawyer and low-lawyer states — searches that are relatively popular where there are a lot of lawyers, and relatively unpopular where there are not a lot of lawyers.

Is this what lawyers actually Google? We can’t know. But I think so. Or maybe what people who work in law firms do, or people who live with lawyers. It’s a very sensitive tool. I made this case first in the post, Stuff White People Google. Check that out if you’re skeptical.

For each occupation, I first offer a few highly correlated searches that support the idea that the data are capturing what these people search for. Then I list some of the interesting other hits from each list.

Results

Police

Police per adult

Police per adult

The map of police per adult looks pretty random, but the list of correlated search terms doesn’t. On the list are “security training,” “tsa jobs,” “waist belt,” “weight vest,” and “air marshals.”

After all the security stuff, the only major category left in the 100 searches most correlated with police in the population is women. Specifically, their search taste includes tough actress Rachel Ticotin, body builder Denise Masino, Brazilian actress Alice Braga, actress Rosario Dawson, and, “israeli women.” (Remember, Google suppresses known porn terms, so this is just what got through the filter.) It’s a leap from this data to the statement, “police search for images of these women,” but this is who they would find if that were the case (is this a “type”?):

policewomensearches

Librarians

Librarians per adult

Librarians per adult

On the other hand, librarians. They are the smallest occupation I tried: the average state population aged 25-64 is only one tenth of one percent librarians. Yet, their distribution leaves an unmistakable trace in the Google search patterns. It especially seems to pick up terms associated with public libraries. Correlated terms include, “cataloguing,” and “quiet hours.” And then there are terms one might ask a librarian about, classic reference-desk questions such as, “which vs that,” “turn off track changes,” “think tanks,” “9/11 commission,” and “irs form 6251″; and term paper topics like Shakespeare titles or “human development report.”

What about the librarians themselves, or those close to them? Could it be they who are searching for Ann Taylor dresses, Garnet Hill free shipping, Lands End home, and textile museums? We can’t know for sure. Of course, if anyone knows how to cover their search tracks, it might be this crowd.

Doctors

Doctors per adult

Doctors per adult

You know they’re doctors, because the search terms most correlated the map include “md, mph,” “md, phd,” “nejm,” “journal medicine,” “tedmed,” and “groopman.” What else do they like? Chic Corea, Tina Fey, Larry David, Mad Men (season 1) and The West Wing, Laura Linney, John Oliver, Scrabble 2-letter words, and a bunch of Jewish stuff.

Lawyers

Lawyers per adult

Lawyers per adult

That’s the map of lawyers per adult across states. Is it really lawyers? The top 100 searches correlated with the distribution shown above include “general counsel,” and then a lot of financial terms like, “world economic forum,” “international finance corporation,” and “economist intelligence.” Then there are international travel terms, like, “rate euro dollar,” “royal air,” and “swiss embassy.”

Looks like lawyers in lawyer-land are richer and more finance-oriented than lawyers in general. On the cultural side, they search for clothing terms Massimo Dutti, Hugo Boss, and Benetton. They apparently like to eat at Zafferano in London, and drink Caipirinhas. Also, they like “vissi,” which is an aria from Tosca but also a Cypriot celebrity; I lean toward the latter, because Queen Rania is also on the list. Finally, they combine their interests in law, finance, and wealthy attractive women by searching for Debrahlee Lorenzana, the “too-hot-for-work” banker.

By popular demand: Post-secondary teachers

postsecondaryperadult

Finally, here without comment are the results for “post-secondary teachers,” which includes any college teacher who didn’t instead specify a specialty, such as “psychologist” or “economist.” (It’s hard to see on the map, but Rhode Island is the highest.) I broke the results into four rough categories:

Academic

attribution
balderdash
bmi index
body image
citation style
cpdl
critical theory
debt to equity
debt to equity ratio
democracy in america
dihedral
economic inequality
economic statistics
economists
educause
edward elgar
effect size
email forward
equals sign
exogenous
feminists
google scholar
growth rates
homomorphism
inflation rate
inflation rates
intelligibility
international study
isomorphic
journal of
journal of nutrition
marginal propensity
marginal propensity to consume
mediating
meters per second
milieu
overlaying
piano sonata
prefrontal
prefrontal cortex
profile of
psychology studies
quick ratio
rejection letter
returns to scale
routledge
scholar
subgroup
superscript
transglutaminase
ways to end a letter

Personal

1% milk
2006 olympics
best pump up songs
crib safety
easy halloween costume
graco snug
handel
ipod history
jackson superbowl
janet jackson superbowl
mastermind game
maxim online
minesweeper
most popular names
napping
national sleep foundation
olympic figure skating
olympics 2006
pairs figure skating
positioning
refereeing
sandra boynton
senior hockey
snl clips
stuff magazine
stumbled upon
toilet training
verum

Musical

1812 overture
acapella group
acapella groups
africa toto
ave verum
for the longest time
it breaks my heart
pdq bach
taylor swift

Birth control

apri
apri birth control
aviane

Conclusions

Poor social scientists, generations of them spending their lives raising a few thousand dollars to ask a few thousand people a few hundred stilted, arbitrary survey questions. Meanwhile, coursing through the cable wires below their feet, and through the air around them, billions of data bits carry so much more potential information about so many more people, in so many intimate aspects of their lives, then we could even dream of getting our hands on. Just think of the power!

RingfrodoNote: I’ve done many posts like this. Some use time series instead of geographic variation, some use terms from Google Books ngrams. Browse the series under the Google tag, or check out this selection:

 

 

2 Comments

Filed under Me @ work

Who’s worried about abstinence?

Probing the deep structure of the collective psyche, or just noise? Either way, kind of interesting.

Are people Googling “abstinence” worried more generally about children’s behavior — maybe their own children’s behavior? Compare the pattern across states in Google searches for “abstinence” and “b. f. skinner” (correlation .79)*:

homeskinner

Searches for “abstinence” (left) and “b. f. skinner” (right)

Out of the top 100 most-correlated-with-”abstinence” searches, these are the others that plausibly have to do with children’s behavior (correlated between .79 and 87):

attention deficit disorder
attention deficit hyperactivity
attention deficit hyperactivity disorder
b.f.
b.f. skinner
behaviors
behavior problems
girls basketball team
hyperactivity
hyperactivity disorder
pregnancies
punishment
student motivation

My mental image here is one of parental desperation, a parent who one day is thinking of how to get her daughter onto the girls’ basketball team and Googling “student motivation,” and the next day is back to “punishment.”

Two other things about the Abstinence Searchers. One is they may be health worriers generally, and/or have health problems (or live in communities with these problems), because these are also in the top 100:

bowel syndrome
cancer facts
coping with
coping with stress
diseases
disorders
effects of drinking
eye disorders
gastric ulcers
heart attacks
heart disease
infant death
infant death syndrome
irritable bowel syndrome
muscular dystrophy
obesity
phenylketonuria
reflux disease
sleeping disorders
sudden infant
sudden infant death
sudden infant death syndrome

This second list makes me more sympathetic to the Abstinence Searchers. On the other hand, it looks like there is a lot of homeschooling going on here as well (the correlation of “abstinence” with “homeschooling” is .54, not in the top 100 but pretty good). These are also in the top 100:

activities for
activities for preschool
activities for preschoolers
activities for students
classroom activities
classroom activity
educational activities
list of famous people
list of the 50 states
projects for students
pronunciation of
pronunciations
textbook publishers
topics
well-known
word games
http://www.census.gov

I am not in favor of abstinence education because it doesn’t serve children well, and I like the idea of children taught complete information by trained professionals. I would never draw conclusions from this kind of superficial analysis, but it’s a little depressing.

* Note, perhaps due to an outbreak of abstinence education in Mississippi, the number of searches there was an outlier, so I top-coded Mississippi at just over the level of the next-highest state, South Dakota.

4 Comments

Filed under Uncategorized

Here it is, your moment of White

A couple years ago, in a post called “Stuff White People Google,” I showed which Google search patterns were most highly correlated with the representation of different race/ethnic groups in the Census. That was a much better post than this.

This is a moment-of-White followup.

Here are Whites, by county, from this tool:

PercentWhite

Here are the searches for “back in black,” from Google Correlate:

backinblack

Google searches for “back in black”

And here is the correlation between searches for “back in black” and searches for “kitten pictures,” by state:

backinblackkittens

The scales are normed to a mean of 0 and standard deviation of 1 by Google, I think. I made the graph in Stata with this command (which I’m putting here because I always forget this syntax):

gr twoway scatter backinblack kittenpictures, mlabel(state) mlabposition(0) msymbol(i)

Random question

So, if it is Whites doing the searching for “back in black” and “kitten pictures,” is it possible that the searches are going on in the same households with some kind of gender division?

acdcfans

Don’t let that selectively-chosen picture fool you. According to the Alexa web traffic site, visitors to acdc.com skew only slightly male. And Facebook tells me I can reach a mostly- but not overwhelmingly-male mix of 3 million women versus 4 million men if I target people with an interest in AC/DC for an ad. (However, if people Googling AC/DC are looking for guitar tabs, maybe it’s the intersection of guitar and AC/DC as interests that matter.)

On the other hand, cuteoverload.com, which is loaded with kitten pictures, skews strongly female, and Facebook tells me that “cat pictures” as an interest will attract women more than men at a ratio of 4-to-1 (much more skewed  than the general interest in cats: 1.5-to-1).

Anyway, this might not be the best case. I wonder what other examples there might be of a specific group (e.g., Whites) being divided between men who have a uniquely strong interest in something (AC/DC) and women who have a uniquely strong interest in something else (kitten pictures), with low overlap between the genders. That would be neat – intersectionality seen in Google search patterns.

So

Anyway, it’s time for another year of graduate student admissions. If you or someone you know like playing with data and making graphs, pursuing hunches about social patterns (more or less important than the ones here), and reading and writing a lot, maybe you or your friend should be in next year’s pile of applications.

3 Comments

Filed under In the news

What’s been queered?

How much has the term, and concept, of queer penetrated the discourse of sexuality, politics and identity?

In the overall use of the terms queer sexualityqueer politics, and queer identity, according to the Google ngrams database of American English usage, queer politics occurs most often, and queer sexuality is last.

queer-useSource: Google ngrams.

On the other hand, as a fraction of references to politics, identity, and sexuality respectively — what you could call the relative penetration of queer — the order is different: queer sexuality has most successfully entered the discourse on sexuality, with queer politics and queer identity quite behind in their relative niches:

queer-penetration

Source: Google ngrams.

(In all of these I used both capitalized and un-capitalized versions. Follow the links to modify the codes yourself.)

4 Comments

Filed under In the news

Family Inequality marriage forecast contest

Enter to win: How many people will get married last year?

marriage-forecase-cartoon

An outfit called Demographic Intelligence, which I’ve written about before, got USA Today to do a story on their new U.S. Wedding Forecast™.

Although there’s no sign of him on the website anymore, DI was founded by W. Bradford Wilcox, according to the Wayback Machine‘s archive. Now it is reportedly run by Samuel Sturgeon, who did a little work for the Heritage Foundation while working on his PhD on welfare and abortion policy with David Eggebeen at Penn State (who joined Mark Regnerus in a Supreme Court brief opposing marriage equality).

Anyway, the wedding forecast is available for sale only, and the formula is a secret (the ™ is for “trust me”). But they leaked some details to USA Today.

The company projects a 4% increase in the number of weddings since 2009, reaching 2.168 million this year; 2.189 million in 2014. Depending on the economic recovery, the report projects a continuing increase to 2.208 million in 2015. … From 2007 to 2009, the number of marriages each year fell from 2.197 million to 2.080 million. The report estimates that more than 175,000 weddings have been postponed or foregone since the recession began.

OK, so the projection for 2013 is 2.168 million. The story doesn’t say what DI forecasts for 2012 — which has already happened, although the official number hasn’t been released. But we should be cautious before buying wedding futures, because, according to USA Today: “This is the company’s first foray into wedding forecasts.”

Do it yourself™

I’ve made a forecast, but like DI-LLC™, I’m sealing it till the end of the contest. But I’ll give you a few data points so you can enter your predicted number of marriages in the U.S. in 2012. The person whose prediction, posted in the comments, is closest to the actual number reported on this page will win a free Family Inequality t-shirt, if I ever get around to making them. In the event of a tie, the prediction posted earliest wins.

Here is the trend from 2000 to 2011. Observe: long-run decline, recession-spike down, then rebound.

marriage-trend

So, is the rebound just a little catching up from delayed marriage, or what? That’s the question. DI says 2,168,000 by 2013. Go marriage!

The USA Today story reports that DI’s forecast is “based on a variety of measures, including unemployment and consumer confidence.” I got some of that for you. I also added the number of women in the US ages 20-39 (who account for about 75% of marriages); these children of the Baby Boomers are a producing a little population bulge which could bring more marriages even at falling rates.

In what could be bad news for the DI forecast, however, I also checked the Google search trends for “wedding invitations,” “bridal shower,” and “wedding gifts.” These are the trends, shown in 3-week moving averages, with each normed so that 100% was the most popular week (the originals are here). See the big rebound continuing in 2012? Me neither. Click to enlarge:

google-marriage-trendsI annualized those numbers for each year 2004 to 2013, with a seasonal adjustment for the first 24 weeks of the year (don’t ask).

But are the Google numbers good for prediction? I used them to predict another down year for marriage in 2011. That wasn’t born out by the vital statistics numbers, which rebounded (as shown in the chart above). On the other hand, the numbers from the American Community Survey showed the decline continuing in to 2011, as reported by Pew. On the third hand, ACS shows continued decline in marriage rates (the difference is in the number of marriage-aged people). We don’t know enough about the difference between ACS and vital statistics to interpret this yet. Uncharted waters.

So, here are your numbers, with everything up to 2013 except the outcome: the number of marriages. Feel free to use these or anything else you like. Or just guess. Remember, Demographic Intelligence boasts of 99% accuracy, but except for 2009 you would have been at 97.5% or better just guessing no change — so you’re bound to be close. The contest is for 2012, but 2013 forecasts are welcome, too — better early than never. Click to enlarge:

marriage-forecast-data

To make it easier, I’ve uploaded the spreadsheet, with sources, here.

In marketing terminology, these variables are very hot leads. Here are the correlations between each variable and the number of marriages, for the years 2004-2011:

marriage-correlations

For other posts about prediction, see:

Good luck!

3 Comments

Filed under Uncategorized

Marriage is going down, so what does Kanye West have to do with it?

The marriage rate has fallen almost continuously for more than half a century, from a sky-high 90 per 1,000 unmarried women in 1950 (meaning almost 1 in 10 single women got married that year) to a bare 31 per 1,000 in 2011. Splashdown appears imminent.

k1

Sources: 1940-1960; 1970-2011.

Social scientists understand that there is a combination of demographic, economic, policy, and cultural factors involved. These include the aging population, men’s declining fortunes, the incarceration of millions of poor men, the rise of secular ideology and the sexual revolution.

Often, however, cultural influence is left to what you might call residual interpretation. Proving that culture affects demographic trends is difficult. Instead, people consider how demographic, economic and policy factors play their roles, and then attribute what’s left of the trend to culture.

Recently, the National Center for Family and Marriage Research at Bowling Green University reported the marriage rate for each state and D.C., ranging from 61 marriages per 1,000 unmarried women in Utah down to 19 per 1,000 in Washington, D.C. and 20 in Rhode Island. To explain the pattern using normal demographic practices, I gathered some other data about states from the Census Bureau: The percent of the population over 65, percent female, percent with a BA or higher education, population density, per capita income and race/ethnic composition. With that information – using a regression – I can guess the marriage rate to within 3.1 points on average. This is what the regression looks like, showing what happens when I start with age and sex composition, add income and education, and then add race/ethnicity:

k2

In statistical terms (R2), my simple model explains 73 percent of the variation in marriage rates, which is pretty good. Before I would use the marriage rate as an indicator of something like “culture,” then, I would say most of what’s going on reflects larger demographic and economic patterns that we more or less understand. The differences that remain, however, still might be the result of cultural, religious, or attitudinal factors that are harder to assess. (I stress this is not about low Black marriage rates: note the population percentage Black has no effect once the other factors are controlled.)

Culture, meet big data

What about big data, the billions of bits of information people leave strewn around wherever they go? Marketers and government spying agencies make most of the headlines, but social scientists, too, are scraping up millions of words and turning them into analyzable numbers, so they can tell you things like:

One of the easiest sources to use for this kind of thing is the Google Correlate tool, which finds the search terms whose frequency most closely follows a specified pattern. I entered the marriage rate for each state, shown on the map on the left, with darker green indicating higher marriage rates. Google Correlate tells me which searches track this variation: which searches are most popular in Utah, least popular in D.C., and so on. (I actually trimmed the Utah rate to it wouldn’t be such an outlier, from 61 down to 57, just above the next highest). It turns out the most correlated search is for “rolls recipe,” which is correlated with the marriage rate at .85 on a scale of -1 to 1.

k3

But since my interest is in the decline of marriage, I multiplied the marriage rate by -1 and tried again (so now darker green indicates a lower marriage rate). The answer, overwhelmingly: Kanye West. (Experts at finding any website anywhere will know that he’s a never-married proud father-to-be with co-parent Kim Kardashian.)

k4b

That correlation between the inverse of the 2011 marriage rate and “kanye west my beautiful dark twisted fantasy” (his last album) is .81. Further, Google produces the top 100 most correlated searches, and of those, no fewer than 28 were about Kanye West (such as “kanye west new album,” “devil in a new dress lyrics” and “air yeezys”). Another 16 were other hip-hop searches, including some about Jay Z and Lil Wayne. Other apparent themes include mafia-related entertainment (“sopranos episode,” “pacino movies,” “corleone”) Sex and the City, and shopping at Marshalls.

Does this tell us more than the simple demographic analysis I did above? When I put the top Kanye search into my model, it has the strongest effect, and the variance explained jumps to 81 percent. The model now can predict the marriage rate to within 2.5 points on average.  It’s a very good predictor, and it’s not just reflecting simple demographics like age, gender and race. Whether Kanye is in the analysis or not, Black population percentage has no effect on this prediction. Here is the regression, with new parts in red:

k4

Explanations

So, I dredged all the search data in the world for something correlated with marriage rates, and found something. But what does it mean? Two cautionary stories are revealing. Forecasting guru Nate Silver has a good description of how noise looks like signal. For example, with the tens of thousands of economic statistics available to build a forecasting model, finding a pattern after the fact is deceptively easy. But it usually doesn’t work for predicting future economic trends.

Another caution comes from genomic studies. In a study of, say, cancer genetics, statisticians may conduct millions of tests for the association between any genetic variant and the occurrence of cancer. With the typical definition of “statistical significance” – which tolerates a 5 percent random chance of being wrong – that means they’d find hundreds of thousands of bogus “significant” associations. So good scientists set their significance threshold for such studies much tighter, more like.00005 percent than 5 percent. That way they are sure to only blow the whistle on genes if the chances of being wrong are vanishingly small.

So, this is a suggestive game of Big-Data Craps, not real research. It’s meant to provoke a little. I hope we’ll think creatively about new kinds of data we can use. Also, I want to generate ideas about cultural explanations for demographic trends. It should be at least as useful as some pundit simply declaring, for example, that gay marriage is killing real marriage. (“As the cause of gay marriage has pressed forward,” wrote Ross Douthat, “the social link between marriage and childbearing has indeed weakened faster than before.” That theory has about as much going for it as one linking the decline of marriage to the rise of high fructose corn syrup or the explosion of red cards in World Cup soccer.)

k5

Kanye’s fantasy

With those caveats, here are three possible explanations for the finding:

  1. Google, by trawling through millions of search term patterns, has come up with a random bit of noise that just happened to catch my attention. There’s nothing there, really.
  2. The hip-hop Google search is capturing a more finely-grained demographic pattern than I did with my simple Census numbers. So what matters for marriage is not just things like the percentage female, education levels and racial composition of the population, but the presence of particular combinations of these demographic groups. Hip hop’s audience is notoriously difficult to define — it’s featured on top-five radio stations in markets such as San Francisco and Los Angeles as well as Detroit and Atlanta — but it’s certainly not as simple as age, gender, and race
  3. Hip-hop actually is weakening marriage in America. People who listen to Kanye West and other hip-hop music are taken in by the music’s consumerist individualism and shun marriage, with its staid image of tradition, conformity and restraint. As a result, they are less likely to get married than the people Googling “rolls recipe.”

I lean toward explanation #2. Explanation #3 might have something to it. As the philosopher xkcd wrote, “correlation does not imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there.’” But I wouldn’t draw that conclusion without a lot more evidence, including doing some comparisons to other cultural factors, like other kinds of music or religious patterns. Since I have no expertise in hip hop (post 1989), I would be glad to hear from people who know about it for realz.

Addendum: Here’s a scattergram showing the correlations between some of the variables in the regression. In each cell there’s a dot for every state plus DC. The Kanye variable is scaled (by Google) to have a mean of 0 and standard deviation of 1 (click to enlarge).

k6

4 Comments

Filed under Uncategorized

What are we becoming a nation of now?

In Jonathan Haidt’s TED talk, “How common threats can make common political ground,” he mentions an influential New York Times article about how people with college degrees are more likely to get and stay married compared with those without college degrees.

At about 15:20 in the talk, Haidt says: “We are becoming a nation of just two classes.”

And I got to thinking about that phrase, “become a nation of…” It puts the reader at the moment of a transition from an assumed past to a specified future. A Google Books search reveals that we have become a nation of many things over the years:

1805: Becoming a nation of free men.

1815: becoming a nation of drunkards.

1822: becoming a nation of castes.

1840: becoming a nation of bull-dogs.

1856: becoming a nation of music lovers in the legitimate sense of the term.

1905: becoming a nation of dreamers, and then, in the next sentence, becoming a nation of money lovers and materialists.

1905: becoming a nation of physicians or even of lawyers.

1944:  fast becoming a nation of neurotics.

1953: becoming a nation of coffee drinkers instead of one of tea drinkers, like England.

1969: becoming a nation of two societies— one white and one black— separate and unequal. (from this awesome issue of Ebony:)

ebony1969

1977: becoming a nation of the elderly.

1985: Becoming a Nation of Readers.

1987: becoming a NATION OF ILLITERATES.

1988: becoming a nation of hamburger stands, and, in the same sentence, becoming a nation of management consultants, doctors, software designers, and international bankers.

1989: Becoming a Nation of Burger Flippers?

2008: becoming a nation of joiners.

2008: becoming a nation of orthorexics (people with an unhealthy obsession with healthy eating)

3 Comments

Filed under In the news

Do people working work in working families?

It’s not that “working families” don’t exist, it’s just the way most people use this term it doesn’t mean anything.

Search Google images for “working families,” and you’ll find images like this:

4f4a9a28-ff28-4bc7-88e5-f0df4522b2dbAnd that’s pretty much the way the term is used: every family is a working family.

To hear the White House talk, you have to wonder whether there are people who aren’t in families. I’ve complained about this before, Obama’s tendency to say things like, “This reform is good for families; it’s good for businesses; it’s good for the entire economy.” As if “families” covers all people.

Specifically, if you Google search the White House website‘s press office directory, which is where the speeches live, like this, you get 457 results, such as this transcript of remarks by Michelle Obama at a “Corporate Voices for Working Families” event. The equivalent search for “working people” yields a paltry 108 hits (many of them Obama speeches at campaign events, which include false-positives, like him making the ridiculous claim that Americans are the “hardest working people on Earth.”) If you search the entire Googleverse for “working families” you get about 318 million hits, versus just just 7 million for “working people” (less than the 10 million that turns up for “Kardashians,” whatever that means.)

You would never know that 33 million Americans live alone – comprising 27% of all households. And 50 million people, or one out of every 6 people, lives in what the Census Bureau defines as a “non-family household,” or a household in which the householder has no relatives (some of those people may be cohabitors, however). The rise of this phenomenon was ably described by Eric Klinenberg in Going Solo: The Extraordinary Rise and Surprising Appeal of Living Alone.

This is partly a complaint about cheap rhetoric, but it’s also about the assumption that families are primary social units when it comes to things like policy and economics, and about the false universality of “middle class” (which is made up of “working families”) in reference to anyone (in a family with anyone) with a job.

Here’s one visualization, from a Google ngrams search of millions of books. The blue line is use of the phrase “working people” as a fraction of references to “people,” while the red line is use of the phrase “working families” as a fraction of references to “families.” It shows, I think, that “working” is coming to define families, not people.

CaptureThis isn’t all bad. Families matter, and part of the attention to “working families” (or Families That Work) is driven by important problems of work-family conflict, unequal care work burdens, and so on. But ultimately these are problems because they affect people (some of whom are in families). When we treat families as the primary unit of analysis, we mask the divisions within families – the conflicts of interest and exploitation, the violence and abuse, and the ephemeral nature of many family relationships and commitments – and we contribute to the marginalization of people who aren’t in, or don’t have, families.  And those members of the No Family community need our attention, too.

4 Comments

Filed under Me @ work, Politics