Saturday, January 25, 2014

Replication, Retraction, and Responsibility

Congrats/thanks to Brent Donnellan, Joe Cesario, and Rich Lucas for their tenacity and perseverance. They conducted 9 studies with more than 3000 participants in order to publish a set of direct replications. Their paper challenged an original report (study 1 in Bargh & Shalev, 2012) claiming that loneliness is correlated with preferred shower temperatures. The new, just-accepted paper did not find a reliable correlation. Donnellan describes their findings and the original studies in what may be the most measured and understated blog post I've seen. You should read it.

The original study had fewer than 100 subjects (51 from a Yale undergraduate sample and a replication with 41 from a community sample), underpowered to detect a typical effect size in a social psychology experiment. But there are bigger problems with the original results.

According to the description in Donnellan's post, the data from the Yale sample were completely screwy: 46/51 Yale students reported taking fewer than 1 shower/bath per week! Either Yale students are filthy, or something's wrong with the data. More critical for the primary question, 42/51 Yale students apparently prefer cold (24 students) or lukewarm (18 students) showers. How many people do you know who prefer cold showers to reasonably hot ones? Again, something's out of whack. In a comment on Donnellan's blog post, Rich Lucas noted that the original distribution of preferred temperatures would mirror what Donnellan et al found if the original data were inadvertently reverse coded. Of course, that would mean the correlation reported in the paper was backwards, and the effect was the opposite of what was claimed.

From an earlier Donnellan post, we know that Bargh was aware of these issues back in 2012, but that he prevented Donnellan and his colleagues from discussing the problems until recently (you should read that post too)In a semi-private email discussion among priming researchers and skeptics, Bargh claimed that his prohibition on discussing his data was just a miscommunication, but he didn't get around to correcting that misconception until he was pressed to respond on that email thread. In the same thread, Bargh claimed to have first learned of these errors from Joe Cesario (who initially requested the original data). Although it's odd that he didn't notice the weird distribution in the frequency responses, I can understand how someone might miss something obvious when they were focusing attention elsewhere... Bargh said that he provided an explanation to the editor at Emotion during the review process: He claimed that Yale students misunderstood the bathing frequency item as asking specifically about baths (not showers). According to Joe Cesiaro's response in that same email thread, though, that doesn't accord with the survey wording about showers/baths that Bargh provided.

Still, whenever and however Bargh learned of the problems with the data, he and Shalev had an obligation to retract the original study and issue an erratum (unless they actually believe Yale students prefer rare, cold showers). Even if the subjects misinterpreted the frequency question, the results are bogus. The problems could well have resulted from an honest oversight, a slip up in coding, a misinterpretation of a poorly worded question, or an Excel copy/paste error. Regardless of the reason, authors have a responsibility to own up to mistakes and to correct the record. Posting to a semi-private email list is not sufficient—the public record in the journal must be amended. Authors have an obligation to correct mistakes once they know of them, and the failure to do so in the published record is troubling.

Note that I am not arguing the original study should be retracted just because Donnellan and colleagues didn't replicate it. A failure to replicate is not adequate grounds for questioning the integrity of the original finding. The original effect size estimates could be wrong, but that's just science working properly to correct itself (that's why direct replications are useful and important). Yet, obviously flawed data like those described by Donnellan should not have to await replication, and scholars reading the literature should be informed that they should not place any stock in that first study with Yale students. That finding should be withdrawn until it can be verified so that it doesn't mislead the field.

One thing I find troubling about this story is that Donnellan, Cesario, and Lucas needed to conduct 9 studies with more than 30x the original number of participants in order to get this paper accepted at Emotion. They should be applauded for replicating with enough power to be sure that their effect size estimates are precise, but each of their studies had more than 2.5x the sample size of the original! If their efforts were entirely voluntary and not a consequence of appeasing reviewers, kudos to them for making sure they got it right. I'm glad that this paper was accepted, and our field owes them gratitude for their efforts. I just hope they haven't set an overly high standard and precedent for what's needed to publish a direct replication.

I would encourage Bargh to issue a public explanation (accessible to the whole field, not just an email thread) for the data issues in their original study. The problems could well have been an accidental coding or interpretation problem, and mistakes are excusable even if they do undermine the claims. More importantly, he should retract the original study (not the whole paper, necessarily -- just the study with problematic data) and issue an erratum in the journal. Out of curiosity, I would like to see an explanation for why the study was not retracted immediately upon learning of the problems more than a year ago.  Perhaps there is a good reason, but I'm having trouble generating one. I hope he will enlighten us.