Wednesday, December 26, 2012

Journals of null results and the goal of replication

Here is my response to the following question that +Gary Marcus forwarded me from one of his readers:
Is there a place in experimental science for a journal dedicated to publishing "failed" experiments? Or would publication in a failed-studies journal be so ignominious for the scientists involved as not to be worthwhile Does a "failed-studies" journal have any chance of success (no pun intended)?

Over the years, there have been a number of attempts to form "null results" journals. Currently, the field has The Journal In Support of the Null Hypothesis (there may well be others). As a rule, such journals are not terribly successful. They tend to become an outlet for studies that can't get published anywhere else. And, given that there are many reasons for failed replications, people generally don't devote much attention to them.

Journals like PLoS One have been doing a better job than many others in publishing direct replication attempts. They emphasize experimental accuracy over theoretical contributions, which fits the goal of a journal that publishes replication attempts whether or not they work. There also are websites now that compile replication attempts (psychfiledrawer.org). The main goal of that site is to make researchers aware of existing replication attempts.

For me, there's a bigger problem with null results journals and websites: They treat replications as an attempt to make a binary decision about the existence of an effect. The replication either succeeds or fails, and there's no intermediary state of the world. Yet, in my view, the goal of replication should be to provide a more accurate estimate of the true effect, not to decide whether a replication is a failure or success.

Few replication attempts should lead to a binary succeed/fail judgment. Some will show the original finding to be a true false positive with no actual effect, but most will just find at the original study overestimated the size of the effect (I say "most" because publication bias ensures that many reported effects overestimate the true effects). The goal of replication should be to achieve greater and greater confidence in the estimate of the actual effet. Only with repeated replication can we zero in on the actual estimate. The greater the size of the new study (e.g., more subjects tested), the better the estimate.

The initiatives I'm pushing behind the scenes (more on those soon) is to encourage multiple replications using identical protocols in order to achieve a better estimate of the true effect. One failure to replicate is no more informative than one positive effect -- both results could be false. With repeated replication, though, we get a sense of what the real effect actually is.