This post contains racist language.
Updated: See comment note and data caution at the end.
This is purely observational, not causal. People Google for racist, sexist, and anti-Semitic jokes more in states that are more favorable toward Trump in the presidential election.
The point of the exercise, as suggested by Seth Stephens-Davidowitz in a 2012 paper published here and discuss here, was to look for population traits that might skew votes in ways the polls did not predict. If people were racist, maybe they would not admit they opposed Obama, but they would still Google “nigger jokes” in their spare time. We don’t yet know whether the polls will accurately capture the vote outcome this year, but I’m interested in the underlying patterns anyway.
I use state data from Google Trends, which coughed up relative search frequencies for the past fives years by state. Each search term is scaled from 100 in the state with the highest search frequency of the term to zero for the lowest (except they don’t go down to zero). For example, West Virginia scores 100 on searches for “nigger jokes” and Oregon scores 17 (the lowest score). Trends does not report the actual number of searches, and some small states are not reported for some jokes, presumably because the data are too sparse.
So here I compare search frequencies for three offensive kinds of jokes, “blonde jokes” (N=48), “nigger jokes” (N=38), and “holocaust jokes” (N=29), with controls for two kinds of innocuous jokes “puns” (favored by Clinton supporting-states) and “knock knock jokes” (favored in Trump states). This might capture the general tendency to Google for jokes. I compared these relative search frequencies to the state polling summary from FiveThirtyEight, which has the Clinton lead from +32.8 in Hawaii to -30.4 in Wyoming (DC is not included here).
The bivariate correlations with the Clinton lead are -.67 for “blonde jokes,” -.61 for “nigger jokes,” and -.48 for “holocaust jokes.” Here are the scatters (click to enlarge):
Again, nothing causal claimed here. Just accounting for other joke telling (which is interesting in itself, here are the multivariate results:
Blonde provides the best fit but they all are still pretty good with the innocuous jokes controlled.
Incidentally, “puns” has no bivariate correlation with Clinton lead, but with “knock knock” controlled it’s very strong. Go figure!
OK, there you have it. Deplorable joke behavior is strongly correlated with Trump support. Nothing causal claimed here.
I put the data and Stata code, including code for the figures, on the Open Science Framework here.
Thanks to the efforts of University of Wisconsin graduate student Nathan Seltzer (see the comment below), it’s come to my attention that the “past five years”data is unstable. Looking just at the “holocaust jokes” data, s/he found non-trivial noise comparing the downloads just a few hours apart. To check this, I just went and repeated the search: “holocaust jokes” for “past five years,” and this is that I got:
Yuck. Thanks for the free data, Google! I’m thankful for Nathan pointing this out. Good lesson in the benefits of sharing data so we can find problems like this — and the trouble with counting on non-open, private data providers like Google. When they’re good, they’re good, but they’re non-transparent and unaccountable when they’re not. It would be great if Google figured out what’s going on and fixed their public access tool. If anyone else can explain this I would be interested to hear.