Rural COVID-19 paper peer reviewed. OK?

Twelve days ago I posted my paper on the COVID-19 epidemic in rural US counties. I put it on the blog, and on the SocArXiv paper server. At this writing the blog post has been shared on Facebook 69 times, the paper has been downloaded 149 times, and tweeted about by a handful of people. No one has told me it’s wrong yet, but not one has formally endorsed it yet, either.

Until now, that is. The paper, which I then submitted to the European Journal of Environment and Public Health, has now been peer reviewed and accepted. I’ve updated the SocArXiv version to the journal page proofs. Satisfied?

It’s a good question. We’ll come back to it.

Preprints

The other day (I think, not good at counting days anymore) a group of scholars published — or should I say posted — a paper titled, “Preprinting a pandemic: the role of preprints in the COVID-19 pandemic,” which reported that there have already been 16,000 scientific articles published about COVID-19, of which 6,000 were posted on preprint servers. That is, they weren’t peer-reviewed before being shared with the research community and the public. Some of these preprints are great and important, some are wrong and terrible, some are pretty rough, and some just aren’t important. This figure from the paper shows the preprint explosion:

F1.large

All this rapid scientific response to a worldwide crisis is extremely heartening. You can see the little sliver that SocArXiv (which I direct) represents in all that — about 100 papers so far (this link takes you to a search for the covid-19 tag), on subjects ranging from political attitudes to mortality rates to traffic patterns, from many countries around the world. I’m thrilled to be contributing to that, and really enjoy my shifts on the moderation desk these days.

On the other hand some bad papers have gotten out there. Most notoriously, an erroneous paper comparing COVID-19 to HIV stoked conspiracy theories that the virus was deliberately created by evil scientists. It was quickly “withdrawn,” meaning no longer endorsed by the authors, but it remains available to read. More subtly, a study (by more famous researchers) done in Santa Clara County, California, claimed to find a very high rate of infection in the general population, implying COVID-19 has a very low death rate (good news!), but it was riddled with design and execution errors (oh well), and accusations of bias and corruption. And some others.

Less remarked upon has been the widespread reporting by major news organizations on preprints that aren’t as controversial but have become part of the knowledge base of the crisis. For example, the New York Times ran a report on this preprint on page 1, under the headline, “Lockdown Delays Cost at Least 36,000 Lives, Data Show” (which looks reasonable in my opinion, although the interpretation is debatable), and the Washington Post led with, “U.S. Deaths Soared in Early Weeks of Pandemic, Far Exceeding Number Attributed to Covid-19,” based on this preprint. These media organizations offer a kind of endorsement, too. How could you not find this credible?

postpreprint

Peer review

To help sort out the veracity or truthiness of rapid publications, the administrators of the bioRxiv and medRxiv preprint servers (who are working together) have added this disclaimer in red to the top of their pages:

Caution: Preprints are preliminary reports of work that have not been certified by peer review. They should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

That’s reasonable. You don’t want people jumping the gun on clinical decisions, or news reports. Unless they should, of course. And, on the other hand, lots of peer reviewed research is wrong, too. I’m not compiling examples of this, but you can always consult the Retraction Watch database, which, for example, lists 130 papers published in Elsevier journals in 2019 that have been retracted for reasons ranging from plagiarism to “fake peer review” to forged authorship to simple errors. The database lists a few peer-reviewed COVID-19 papers that have already been retracted as well.

This comparison suggests that the standard of truthiness cannot be down to the simple dichotomy of peer reviewed or not. We need signals, but they don’t have to be that crude. In real life, we use a variety of signals for credibility that help determine how much to trust a piece of research. These include:

  • The reputation of the authors (their degrees, awards, twitter following, media presence)
  • The institutions that employ them (everyone loves to refer to these when they are fancy universities reporting results they favor, e.g., “the Columbia study showed…”)
  • Who published it (a journal, an association, a book publisher), which implies a whole secondary layer of endorsements (e.g., the editor of the journal, the assumed expertise of the reviewers, the prestige or impact factor of the journal as a whole, etc.)
  • Perceived conflicts of interest among the authors or publishers
  • The transparency of the research (e.g., are the data and materials available for inspection and replication)
  • Informal endorsements, from, e.g., people we respect on social media, or people using the Plaudit button (which is great and you should definitely use if you’re a researcher)
  • And finally, of course, our own assessment of the quality of the work, if it’s something we believe ourselves qualified to assess

As with the debate over the SAT/GRE for admissions, the quiet indicators sometimes do a lot of the work. Call something a “Harvard study” or a “New York Times report,” and people don’t often pry into the details of the peer review process.

Analogy: People who want to eat only kosher food need something to go on in daily life, and so they have erected a set of institutional devices that deliver such a seal (in fact, there are competing seal brands, but they all offer the same service: a yes/no endorsement by an organization one decides to trust). The seals cost money, which is added to the cost of the food; if people like it, they’re willing to pay. But, as God would presumably tell you, the seal should not always substitute for your own good judgment because even rabbis or honest food producers can make mistakes. And in the absence of a good kosher inspection to rely on altogether, you still have to eat — you just have to reason things through to the best of your ability. (In a pinch, maybe follow the guy with the big hat and see what he eats.) Finally, crucially for the analogy, anyone who tells you to ignore the evidence before you and always trust the authority that’s selling the dichotomous indicator is probably serving their own interests as least as much as they’re serving yours.

In the case of peer review, giant corporations, major institutions, and millions of careers depend on people believing that peer review is what you need to decide what to trust. And they also happen to be selling peer review services.

My COVID-19 paper

So should you trust my paper? Looking back at our list, you can see that I have degrees and some minor awards, some previous publications, some twitter followers, and some journalists who trust me. I work at a public research university that has its own reputation to protect. I have no apparent way of profiting from you believing one thing or another about COVID-19 in rural areas (I declared no conflicts of interest on the SocArXiv submission form). I made my data and code available (even if no one checks it, the fact that it’s there should increase your confidence). And of course you can read it.

And then I submitted it to the European Journal of Environment and Public Health, which, after peer review, endorsed its quality and agreed to publish it. The journal is published by Veritas Publications in the UK with the support of Tsinghua University in China. It’s an open access journal that has been publishing for only three years. It’s not indexed by Web of Science or listed in the Directory of Open Access Journals. It is, in short, a low-status journal. On the plus side, it has an editorial board of real researchers, albeit mostly at lower status institutions. It publishes real papers, and (at least for now) it doesn’t charge authors any  publication fee, it does a little peer review, and it is fast. My paper was accepted in four days with essentially no revisions, after one reviewer read it (based on the summary, I believe they did read it). It’s open access, and I kept my copyright. I chose it partly because one of the papers I found on Google Scholar during my literature search was published there and it seemed OK.

So, now it’s peer reviewed.

Here’s a lesson: when you set a dichotomous standard like peer-reviewed yes/no and tell the public to trust it, you create the incentive for people to do the least they can to just barely get over that bar. This is why we have a giant industry of tens of thousands of academic journals producing products all branded as peer reviewed. Half a century ago, some academics declared themselves the gatekeepers of quality, and called their system peer review. To protect the authority of their expertise (and probably because they believed they knew best), they insisted it was the standard that mattered. But they couldn’t prevent other people from doing it, too. And so we have a constant struggle over what gets to be counted, and an effort to disqualify some journals with labels like “predatory,” even though it’s the billion-dollar corporations at the top of this system that are preying on us the most (along with lots of smaller scam purveyors).

In the case of my paper, I wouldn’t tell you to trust it much more because it’s in EJEPH, although I don’t think the journal is a scam. It’s just one indicator. But I can say it’s peer reviewed now and you can’t stop me.

Aside on service and reciprocity: Immediately after I submitted my paper, the EJEPH editors sent me a paper to review, which I respect. I declined because it wasn’t qualified, and then they sent me another. This assignment I accepted. The paper was definitely outside my areas of expertise, but it was a small study quite transparently done, in Nigeria. I was able to verify important details — like the relevance of the question asked (from cited literature), the nature of the study site (from Google maps and directories), the standards of measurement used (from other studies), the type of the instruments used (widely available), and the statistical analysis. I suggested some improvements to the contextualization of the write-up and recommended publication. I see no reason why this paper shouldn’t be published with the peer review seal of approval. If it turns out to be important, great. If not, fine. Like my paper, honestly. I have to say, it was a refreshing peer review experience on both ends.

The COVID-19 epidemic in rural U.S. counties

I’ve been working on the COVID-19 epidemic in rural U.S. counties, and have now posted a paper on SocArXiv, here: https://osf.io/preprints/socarxiv/pnqrd/. Here’s the abstract, then some figures below:

Having first reached epidemic proportions in coastal metropolitan areas, COVID-19 has spread around the country. Reported case rates vary across counties from zero to 125 per thousand population (around a state prison in the rural county of Trousdale, Tennessee). Overall, rural counties are underrepresented relative to their share of the population, but a growing proportion of all daily cases and deaths have been reported in rural counties. This analysis uses daily reports for all counties to present the trends and distribution of COVID-19 cases and deaths in rural counties, from late March to May 16, 2020. I describe the relationship between population density and case rates in rural and non-rural counties. Then I focus on noteworthy outbreaks linked to prisons, meat and poultry plants, and nursing homes, many of which are linked to high concentrations of Hispanic, American Indian, and Black populations. The growing epidemic in rural counties is apparently driven by outbreaks concentrated in these institutional settings, which are conducive to transmission. The impact of the epidemic in rural areas may be heightening due to their weaker health infrastructure and more vulnerable populations, especially due to age, socioeconomic status, and health conditions. As a result, the epidemic may contribute to the ongoing decline of health, economic, and social conditions in rural areas.

Here are COVID-19 cases in rural counties across the country. Note that the South, Mid-Atlantic, Michigan, and New England have the most (fewer in West and upper Midwest). When you look at cases per capita, you see the concentration in the South and isolated others.

F1 rural county cases maps

COVID is still underrepresented in rural counties, but their share of the national burden is increasing, as they keep adding more than 2,000 cases and just under 100 deaths per day.

F2 new cases and deaths

Transmission dynamics are different in rural counties. They show a weaker relationship between pop density and cases. This suggests to me that there are more idiosyncratic factors at work (prisons, meat plants, nursing homes), which are high concentrations of vulnerable people.

F3 population density and cases

These are the rural outbreak cases I identified, for which I could find obvious epidemic centers in institutions: Prisons, meatpacking and poultry plants, and nursing homes. These 28 select counties account for 15% of the rural burden.

F4 rural county selected cases

In addition to the institutional concentration, these outbreak cases also show distinct overrepresentation of Hispanic, American Indian, and Black populations. Here are some of the outbreak cases plotted against minority concentrations.

F5 rural county minority scatters

And here’s a table of those selected cases:

crt2

Lots more to be done, obviously. It’s a strong limitation to be restricted to case and death counts at the county level. Someone could go get lists of prisons and meatpacking plants and nursing homes and run them through this, etc. But I wanted to raise this issue substantively. By posting the paper on SocArXiv, without peer review, I’m offering it up for comment and criticism. Also, I’m sharing the code (which links to the data, all public): osf.io/wd2n6/. Messy but usable.

A related thought on writing a paper about COVID19 right now: The lit review is daunting. There are thousands of papers, most on preprint servers. Is this bad? No. I use various tools to decide what’s reliable to learn from. If it’s outside my area, I’m more likely to rely on peer-reviewed journals, or those that are widely citied or reported. But the vast quantity available still helps me see what people are working on, what terms, and types of data they use. I learned a tremendous amount. Much respect to the thousands of researchers who are doing what they can to respond to this global crisis.