Social science researchers should get serious about using Google or other search data. Someone has to figure out what we can and cannot get from this amazing data. Here is some material to help motivate on this issue.
In each of the figures below, I compare real demographic data on state population composition with Google search patterns, using the Google correlate tool I’ve used before for divorce rates and Obama votes. If, as you can see, the correlation between the percentage of the population that is non-Latino-single-race-White and searches for “back in black guitar tab” is .89, what does it mean?
In case you’re prepared to be offended, remember this does not mean this is most of what these groups search for, or most of the searches in these areas. Rather, it’s the things that are searched for in these states that are not searched for in other states. So, people in all groups search for porn and shopping and restaurant reviews and health conditions — but these are the things that differentiate the states.
In each of the cases I’ve selected, I strongly suspect that the searchers using these terms are mostly the people in the demographic. But what good is it, and what are the risks?
White-alone, non-Hispanic and “back in black guitar tab” (4th highest correlation):
A bunch of the White ones were music, such as “walkin on sunshine,” “end of the world as we know it,” “wayward son,” and even “safety dance.”
Black-alone and “regina belle” (top correlation):
Also on the list, several about Black colleges, the pan hellenic system, Essence and Ebony, the Obamas and BET.
Latino and “solo tu lyrics” (top correlation):
Most of these are in Spanish and about pop culture.
Asian and “double eyelid” (top correlation):
(I removed Hawaii which is an extreme outlier, but it didn’t make much difference)
Lots of these are Korean words, and things about beauty.
American Indian alone and “native threads” (top correlation):
The biggest group here is about government agencies, like the Indian Health Service and Bureau of Indian Affairs; also beading, stitching and music.
Population age 65+ and “fosomax” (2nd highest correlation):
Most of these are old-age related health conditions and drugs (e.g., aortic stenosis), as well as Social Security and sympathy-related quotes (e.g., “losing someone quotes”), “the time of my life lyrics,” and “new family guy season.”
Population with BA degree or higher and “passport expiration” (5th highest correlation):
The Economist features in several places on this list, as does iPod stuff, things about travel to Europe (e.g., exchange rates), and “what the dog saw,” “baby jogger” and “index funds.”
See the complete list in a PDF document here.