Racist, sexist, and anti-Semitic jokes in Trump land

This post contains racist language.

Updated: See comment note and data caution at the end.

This is purely observational, not causal. People Google for racist, sexist, and anti-Semitic jokes more in states that are more favorable toward Trump in the presidential election.

The point of the exercise, as suggested by Seth Stephens-Davidowitz in a 2012 paper published here and discuss here, was to look for population traits that might skew votes in ways the polls did not predict. If people were racist, maybe they would not admit they opposed Obama, but they would still Google “nigger jokes” in their spare time. We don’t yet know whether the polls will accurately capture the vote outcome this year, but I’m interested in the underlying patterns anyway.

I use state data from Google Trends, which coughed up relative search frequencies for the past fives years by state. Each search term is scaled from 100 in the state with the highest search frequency of the term to zero for the lowest (except they don’t go down to zero). For example, West Virginia scores 100 on searches for “nigger jokes” and Oregon scores 17 (the lowest score). Trends does not report the actual number of searches, and some small states are not reported for some jokes, presumably because the data are too sparse.

So here I compare search frequencies for three offensive kinds of jokes, “blonde jokes” (N=48), “nigger jokes” (N=38), and “holocaust jokes” (N=29), with controls for two kinds of innocuous jokes “puns” (favored by Clinton supporting-states) and “knock knock jokes” (favored in Trump states). This might capture the general tendency to Google for jokes. I compared these relative search frequencies to the state polling summary from FiveThirtyEight, which has the Clinton lead from +32.8 in Hawaii to -30.4 in Wyoming (DC is not included here).

The bivariate correlations with the Clinton lead are -.67 for “blonde jokes,” -.61 for “nigger jokes,” and -.48 for “holocaust jokes.” Here are the scatters (click to enlarge):

Again, nothing causal claimed here. Just accounting for other joke telling (which is interesting in itself, here are the multivariate results:


Blonde provides the best fit but they all are still pretty good with the innocuous jokes controlled.

Incidentally, “puns” has no bivariate correlation with Clinton lead, but with “knock knock” controlled it’s very strong. Go figure!

OK, there you have it. Deplorable joke behavior is strongly correlated with Trump support. Nothing causal claimed here.

I put the data and Stata code, including code for the figures, on the Open Science Framework here.

For other relevant posts follow the Google tag and the Trump tag.


Thanks to the efforts of University of Wisconsin graduate student Nathan Seltzer (see the comment below), it’s come to my attention that the “past five years”data is unstable. Looking just at the “holocaust jokes” data, s/he found non-trivial noise comparing the downloads just a few hours apart. To check this, I just went and repeated the search: “holocaust jokes” for “past five years,” and this is that I got:


Yuck. Thanks for the free data, Google! I’m thankful for Nathan pointing this out. Good lesson in the benefits of sharing data so we can find problems like this — and the trouble with counting on non-open, private data providers like Google. When they’re good, they’re good, but they’re non-transparent and unaccountable when they’re not. It would be great if Google figured out what’s going on and fixed their public access tool. If anyone else can explain this I would be interested to hear.


Filed under Politics

5 responses to “Racist, sexist, and anti-Semitic jokes in Trump land

  1. Phil Cowan

    Thanks for thinking about how to get data (even correlational data) on this topic


  2. myra

    Philip — one of my grad students is a skeptic about the reliability of Google Trends and did a nearly-immediate replication:

    Couldn’t help myself, sorry. I decided to partially replicate Phil Cohen’s Google Trend numbers — Not the graphs or regressions, but simply the raw Google Trends numbers that he linked to in his replication package (replication file link). I downloaded the relative search frequencies for “holocaust jokes” from Google Trends at approximately 11:00pm tonight according to the exact specifications that Phil used. I then compared my numbers to the relative search frequencies for “holocaust jokes” that Phil accessed at 1:20pm today (according to his excel file metadata).

    Over the course of 10 hours, I find (a) a mean difference in frequencies for all states by 2.2 points, and (b) a correlation of .987 between our frequencies.

    The differences were substantial. Only 4 out of 30 states have numbers that were the same. South Carolina had the highest difference in numbers (7) — Phil’s relative search frequency for South Carolina is 66, mine is 59.

    If Google Trends had daily temporal reliability, we would expect there to be no differences in our numbers and therefore perfect correlation. This clearly isn’t the case. I think this emphasizes the skepticism we should have about using Google Trends.

    I will email you the table he sent me, since I can’t attach it here


    • I just got the numbers again, and found they have changed again. Maybe my blog post created so much attention to Google searches for “holocaust jokes” that the “last five years” data actually moved, but more likely there is a reliability problem.

      I’m very glad your student checked this, and also that I made my extract available so I was alerted to this problem. It’s too bad Google is not similarly open with the data and methods. Of course, that’s not their business, providing data to researchers. So call it a lesson in the importance of public data collection and sharing. We can’t just assume that private companies — like Google, or Twitter (which people also use for this kind of research all the time) — are committed to a public-interest mission, because they’re not. Also, while we’re at it, don’t treat Google Scholar, Academia.edu, or ResearchGate, as public utilities whose goal is to serve the research community. We need to build genuinely open public alternatives (plug: SocArXiv.org).



  3. You also have to be careful of the ecological fallacy – that is, just because more people are googling knock-knock jokes in a particular state, and then the state votes a particular way, does not mean the knock-knock joke tellers were the ones pushing the votes. (I used this one example to avoid having to write words I do not care to write.) Perhaps the knock-knock joke advocates were outnumbered by knock-knock joke haters who were energized to vote for the OPPOSITE candidate because they were so tired of hearing the stupid jokes. … [This comes from an example decades ago, where it was noted that in an Old South state, the polling districts with the most blacks voted the most conservative, so the researcher concluded blacks were actually conservative. Another researcher pointed out that, at the time, blacks were effectively prevented from voting in those districts. In fact, the higher the percentage of blacks, the greater the degree of social control and the harder it was for blacks to vote. Thus, you end up yet again with the unsurprising result that Southern white voters were conservative …]

    I still loved the creativity.


Comments welcome (may be moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s