The impact of Impact Factors
Some of this first section is lifted from my blockbuster report, Scholarly Communication in Sociology, where you can also find the references.
When a piece of scholarship is first published it’s not possible to gauge its importance immediately unless you are already familiar with its specific research field. One of the functions of journals is to alert potential readers to good new research, and the placement of articles in prestigious journals is a key indicator.
Since at least 1927, librarians have been using the number of citations to the articles in a journal as a way to decide whether to subscribe to that journal. More recently, bibliographers introduced a standard method for comparing journals, known as the journal impact factor (JIF). This requires data for three years, and is calculated as the number of citations in the third year to articles published over the two prior years, divided by the total number of articles published in those two years.
For example, in American Sociological Review there were 86 articles published in the years 2017-18, and those articles were cited 548 times in 2019 by journals indexed in Web of Science, so the JIF of ASR is 548/86 = 6.37. This allows for a comparison of impact across journals. Thus, the comparable calculation for Social Science Research is 531/271 = 1.96, and it’s clear that ASR is a more widely-cited journal. However, comparisons of journals in different fields using JIFs is less helpful. For example, the JIF for the top medical journal, New England Journal of Medicine, is currently 75, because there are many more medical journals publishing and citing more articles at higher rates, and more quickly than do sociology journals. (Or maybe NEJM is just that much more important.)
In addition to complications in making comparisons, there are problems with JIFs (besides the obvious limitation that citations are only one possible evaluation metric). They depend on what journals and articles are in the database being used. And they mostly measure short-term impact. Most important for my purposes here, however, is that they are often misused to judge the importance of articles rather than journals. That is, if you are a librarian deciding what journal to subscribe to, JIF is a useful way of knowing which journals your users might want to access. But if you are evaluating a scholar’s research, knowing that they published in a high-JIF journal does not mean that their article will turn out to be important. It is especially wrong to look at an article that’s old enough to have citations you could count (or not) and judge its quality by the journal it’s published in — but people do that all the time.
To illustrate this, I gathered citation data from the almost 2,500 articles published in 2016-2019 in 15 sociology journals from the Web of Science category list.* In JIF these rank from #2 (American Sociological Review, 6.37) to #46 (Social Forces, 1.95). I chose these to represent a range of impact factors, and because they are either generalist journals (e.g., ASR, Sociological Science, Social Forces) or sociology-focused enough that almost any article they publish could have been published in a generalist journal as well. Here is a figure showing the distribution of citations to those articles as of December 2020, by journal, ordered from higher to lower JIF.
After ASR, Sociology of Education, and American Journal of Sociology, it’s hard to see much of a slope here. Outliers might be playing a big role (for example that very popular article in Sociology of Religion, “Make America Christian Again: Christian Nationalism and Voting for Donald Trump in the 2016 Presidential Election,” by Whitehead, Perry, and Baker in 2018). But there’s a more subtle problem, which is the timing of the measures. My collection of articles is 2016-2019. The JIFs I’m using are from 2019, based on citations to 2017-2018 articles. These journals bounce around; for example, Sociology of Religion jumped from 1.6 to 2.6 in 2019. (I address that issue in the supplemental analysis below.) So what is a lazy promotion and tenure committee, which is probably working off a mental reputation map at least a dozen years old, to do?
You can already tell where I’m going with this: In these sociology journals, there is so much noise in citation rates within the journals, compared to any stable difference between them, that outside the very top the journal ranking won’t much help you predict how much a given paper will be cited. If you assume a paper published in AJS will be more important than one published in Social Forces, you might be right, but if the odds that you’re wrong are too high, you just shouldn’t assume anything. Let’s look closer.
Sociology failure rates
I recently read this cool paper (also paywalled in the Journal of Informetrics) that estimates the odds of this “failure probability,” the odds that your guess about which paper will be more impactful based on the journal title turns out to be wrong. When JIFs are similar, the odds of an error are very high, like a coin flip. “In two journals whose JIFs are ten-fold different, the failure probability is low,” Brito and Rodríguez-Navarro conclude. “However, in most cases when two papers are compared, the JIFs of the journals are not so different. Then, the failure probability can be close to 0.5, which is equivalent to evaluating by coin flipping.”
Their formulas look pretty complicated to me, so for my sociology approach I just did it by brute force (or if you need tenure you could call it a Monte Carlo approach). I randomly sampled 100,000 times from each possible pair of journals, then calculated the percentage of times the article with more citations was from a journal with a higher impact factor. For example, in 100,000 comparisons of random pairs sampled from ASR and Social Forces (the two journals with the biggest JIF spread), 73% of the time the ASR article had more citations.
Is 73% a lot? It’s better than a coin toss, but I’d hate to have a promotion or hiring decision be influenced by an instrument that blunt. Here are results of the 10.5 million comparisons I made (I love computers). Click to enlarge:
Outside of the ASR column, these are very bad; in the ASR column they’re pretty bad. For example, a random article from AJS only has more citations than one from the 12 lower-JIF journals 59% of the time. So if you’re reading CVs, and you see one candidate with a two-year old AJS article and one with a two-year-old Work & Occupations article, what are you supposed to do? You could compare the actual citations the two articles have gotten, or you could assess their quality of impact some other way. You absolutely should not just skim the CV and assume the AJS article is or will be more influential based on the journal title alone; the failure probability of that assumption is too high.
On my table you can also see some anomalies, of the kind which plague this system. See all that brown in the BJS and Sociology of Religion columns? That’s because both of those journals had sudden increases in their JIF, so their more recent articles have more citations, and most of the comparisons in this table (like in your memory, probably) are based on data from a few years before that. People who published in these journals three years ago are today getting an undeserved JIF bounce from having these titles on their CVs. (See the supplemental analysis below for more on this.)
Using JIF to decide which papers in different sociology journals are likely to be more impactful is a bad idea. Of course, lots of people know JIF is imperfect, but they can’t help themselves when evaluating CVs for hiring or promotion. And when you show them evidence like this, they might say “but what is the alternative?” But as Brito & Rodríguez-Navarro write: “if something were wrong, misleading, and inequitable the lack of an alternative is not a cause for continuing using it.” These error rates are unacceptably high.
In sociology most people won’t own up to relying on impact factors, but most people (in my experience) do judge research by where it’s published all the time. If there is a very big difference in status — enough to be associated with an appreciably different acceptance rate, for example — that’s not always wrong. But it’s a bad default.
In 2015 the biologist Michael Eisen suggested that tenured faculty should remove the journal titles from their CVs and websites, and just give readers the title of the paper and a link to it. He’s done it for his lab’s website, and I urge you to look at it just to experience the weightlessness of an academic space where for a moment overt prestige and status markers aren’t telling you what to think. I don’t know how many people have taken him up on it. I did it for my website, with the explanation, “I’ve left the titles off the journals here, to prevent biasing your evaluation of the work before you read it.” Whatever status I’ve lost I’ve made up for in virtue-signaling self-satisfaction — try it! (You can still get the titles from my CV, because I feel like that’s part of the record somehow.)
Finally, I hope sociologists will become more sociological in their evaluation of research — and of the systems that disseminate, categorize, rank, and profit from it.
The analysis thus far is, in my view, a damning indictment of real-world reliance on the Journal Impact Factor for judging articles, and thus the researchers who produce them. However, it conflates two problems with the JIF. First is the statistical problem of imputing status from an aggregate to an individual, when the aggregate measure fails to capture variation that is very wide relative to the difference between groups. Second, more specific to JIF, is the reliance on a very time-specific comparison: citations in year three to publications in years one and two. Someone could do (maybe already has) an analysis to determine the best lag structure for JIF to maximize its predictive power, but the conclusions from the first problem imply that’s a fool’s errand.
Anyway, in my sample the second problem is clearly relevant. My analysis relies strictly on the rank-ordering provided by the JIF to determine whether article comparisons succeed or fail. However, the sample I drew covers four years, 2016-2019, and counts citations to all of them through 2020. This difference in time window produces a rank ordering that differs substantially (the rank order correlation is .73), as you can see:
In particular, three journals (BJS, SOR, and SFO) moved more than five spots in the ranking. A glance at the results table above shows that these journals are dragging down the matching success rate. To pull these two problems apart, I repeated the analysis using the ranking produced within the sample itself.
The results are now much more straightforward. First, here is the same box plot but with the new ordering. Now you can see the ranking more clearly, though you still have to squint a little.
And in the match rate analysis, the result is now driven by differences in means and variances rather than by the mismatch between JIF and sample-mean rankings (click to enlarge):
This makes a more logical pattern. The most differentiated journal, ASR, has the highest success rate, and the journals closest together in the ranking fail the most. However, please don’t take from this that such a ranking becomes a legitimate way to judge articles. The overall average on this table is still only 58%, up only 4 points from the original table. Even with a ranking that more closely conforms to the sample, this confirms Brito and Rodríguez-Navarro’s conclusion: “[when rankings] of the journals are not so different … the failure probability can be close to 0.5, which is equivalent to evaluating by coin flipping.”
These match numbers are too low to responsibly use in such a way. These major sociology journals have citation rates that are too variable, and too similar at the mean, to be useful as a way to judge articles. ASR stands apart, but only because of the rest of the field. Even judging an ASR paper against its lower-ranked competitors produces a successful one-to-one ranking of papers just 72% of the time — and that only rises to 82% with the least-cited journal on the list.
The supplemental analysis is helpful for differentiating the multiple problems with JIF, but it does nothing to solve the problem of using journal citation rates to evaluate individual articles.
*The data and Stata code I used is up here: osf.io/zutws. This includes the lists of all articles in the 15 journals from 2016 to 2020 and their citation counts as of the other day (I excluded 2020 papers from the analysis, but they’re in the lists). I forgot to save the version of the 100k-case random file that I used to do this, so I guess that can never be perfectly replicated; but you can probably do it better anyway.