Daniel Simons: Journals of null results and the goal of replication

Wednesday, December 26, 2012

Journals of null results and the goal of replication

Here is my response to the following question that +Gary Marcus forwarded me from one of his readers:

Is there a place in experimental science for a journal dedicated to publishing "failed" experiments? Or would publication in a failed-studies journal be so ignominious for the scientists involved as not to be worthwhile Does a "failed-studies" journal have any chance of success (no pun intended)?

Over the years, there have been a number of attempts to form "null results" journals. Currently, the field has The Journal In Support of the Null Hypothesis (there may well be others). As a rule, such journals are not terribly successful. They tend to become an outlet for studies that can't get published anywhere else. And, given that there are many reasons for failed replications, people generally don't devote much attention to them.

Journals like PLoS One have been doing a better job than many others in publishing direct replication attempts. They emphasize experimental accuracy over theoretical contributions, which fits the goal of a journal that publishes replication attempts whether or not they work. There also are websites now that compile replication attempts (psychfiledrawer.org). The main goal of that site is to make researchers aware of existing replication attempts.

For me, there's a bigger problem with null results journals and websites: They treat replications as an attempt to make a binary decision about the existence of an effect. The replication either succeeds or fails, and there's no intermediary state of the world. Yet, in my view, the goal of replication should be to provide a more accurate estimate of the true effect, not to decide whether a replication is a failure or success.

Few replication attempts should lead to a binary succeed/fail judgment. Some will show the original finding to be a true false positive with no actual effect, but most will just find at the original study overestimated the size of the effect (I say "most" because publication bias ensures that many reported effects overestimate the true effects). The goal of replication should be to achieve greater and greater confidence in the estimate of the actual effet. Only with repeated replication can we zero in on the actual estimate. The greater the size of the new study (e.g., more subjects tested), the better the estimate.

The initiatives I'm pushing behind the scenes (more on those soon) is to encourage multiple replications using identical protocols in order to achieve a better estimate of the true effect. One failure to replicate is no more informative than one positive effect -- both results could be false. With repeated replication, though, we get a sense of what the real effect actually is.

4 comments:

HoftheP12/26/2012 12:09:00 PM
In truth, PsychFileDrawer's creators do not subscribe to the view that the outcome of the experiment is best thought of simply in binary terms ("significant" vs "nonsignificant") and our FAQs encouraged people to use common sense (e.g., don't label a strong trend in the same direction as the original result as a "failure to replic" merely because it doesn't reach significance.) But this informality does not provide any bright lines, unfortunately. We agree that the meta-analytic approach of synthesizing effect sizes and confidence intervals is generally a good way to go, but felt it would be inadvisable to require this when many investigators are not easily able to provide this information.

Of course, if literatures contain pseudo-effects spawned by type-1 errors or p-hacking or fraud, then the synthetic mean may perpetuate confusion, while the more simple-minded conclusion "oops-- nothing there!" may sometimes be more on target.

The big issue, though, is still incentivizing replications (on this problem, PsychFileDrawer has done more to dramatize the problem than it has done to solve it). Dan's approach has great promise not only to generate a balanced and smart review process for replication attempts, but also to incentivize doing replication attempts in the first place. And that in turn should help disincentivize publishing stuff that won't replicate. A virtuous cycle. Go Dan and Alex!
ReplyDelete
Replies
Rolf Zwaan12/27/2012 08:07:00 AM
Good piece. Two points (1) I agree that forcing the outcomes of replication attempts into a either a "success" or a "failure" bin is not very insightful and even a little stigmatizing. In our recent paper, http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0051382, we (almost completely) avoided using these labels, trying to take a more nuanced approach, which does not always make for effective communication. (2) I like the idea of "multiple replications using identical protocols." With our experiments, this would be extremely easy. We can share our data-collection and data-analysis programs. The experiment will take a couple of days tops to run and will only set you back a couple hundred dollars. Experimenter effects will be practically nonexistent.
ReplyDelete
Replies

Add comment

New comments are not allowed.