Monday, September 24, 2012

The fog of data - secrecy and science

Brent Donnellan recently wrote this troubling blog post about his experiences in requesting the data from a published study that he and his colleagues were trying to replicate (and, in fact, failing to replicate). John Bargh provided the data from his original study (Bargh & Shalev, 2012), but demanded that Donnellan and colleagues keep the data confidential and not share it or their re-analyses with anyone. 

That demand flies in the face of both APA and NIH ethical guidelines. NIH requires grantees, at least for grants over 500k, to provide a data sharing plan and to stick to it. When confidentially of the subjects can be assured (as it almost certainly could in this case), and the study is already published, there is no excuse for preventing other scholars from discussing the original data or identifying any problems they found. That’s how science is supposed to operate. If there were some mistake in the analysis or the data itself, and a re-analysis revealed that problem, it should be made public so that other scholars are aware of the concerns. Of course, it would be inappropriate to use someone else's data to generate new publications based on the data without the permission of the original authors, but that's not what Donnellan and colleagues wanted to do (see the comments on Donnellan's post).

Here’s what APA's ethics code states:
8.14 Sharing Research Data for Verification
(a) After research results are published, psychologists do not withhold the data on which their conclusions are based from other competent professionals who seek to verify the substantive claims through reanalysis and who intend to use such data only for that purpose, provided that the confidentiality of the participants can be protected and unless legal rights concerning proprietary data preclude their release. This does not preclude psychologists from requiring that such individuals or groups be responsible for costs associated with the provision of such information.
(b) Psychologists who request data from other psychologists to verify the substantive claims through reanalysis may use shared data only for the declared purpose. Requesting psychologists obtain prior written agreement for all other uses of the data.
Here’s what NSF states:
[NSF] expects PIs to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work.

The policy expects final research data, especially unique data, from NIH-supported research efforts to be made available to other investigators.

In this case, Donnellan and his students were trying to replicate a prominent, heavily cited and discussed paper reporting that who claimed that shower temperature was highly correlated with reported loneliness (r=.57). The unusually large correlation between two loosely related measures led Donnellan and his colleagues attempt a replication with a much larger sample. They found no effect.

For me, the demand that Donnellan and his colleagues not reveal any aspect of their data makes me highly concerned and suspicious about the original data. I respect Donnellan and colleagues for abiding by Bargh’s wishes, but Bargh has an ethical obligation to provide his data to any qualified researcher who wishes to view and re-analyze it. His refusal to do so should give anyone pause about the validity of his published result, especially given that this demand comes in the face of a large-scale failure to replicate that result. It seems that if his data are legitimate, he should be happy to make them public so that others in the field can see what he found and explore reasons why another lab might not produce the same results. What possible motive could he have to prevent another lab from discussing his data? If he's concerned that they will misrepresent what he found, then he can just make the same data available to everyone. Secrecy about data from a published study is inconsistent with the process of scientific discovery and debate.

A few months ago, Bargh wrote a virtriolic blog response to another failure to replicate one of his studies, accusing those researchers of incompetence. (I would post a link, but the offending posts have been removed from Bargh's blog sometime in the past few weeks.) At the time, I blogged about Bargh’s response, arguing that it was a case study in how NOT to respond to a failure to replicate. Findings can fail to replicate for many reasons, not just due to the incompetence of those attempting the replication. Yes, replication failures are frustrating, but they are part of science. The proper response would be continued exploration to see if there are reasons for the inconsistent pattern of results. Now, when confronted with another failure to replicate, his response is to demand secrecy about his data. And, at almost the same time, the earlier blog posts railing against other investigators (and journalists and journals...) were removed, along with all the associated responses and critiques that appeared in the comments.

Science by obscurity is not acceptable science. Scientific data from published findings are not state secrets, and other labs and scientists who can’t reproduce your results are not villains (even if you disagree with them). As a field, we should be concerned whenever a scientist imposes a gag order on the public discussion of their data from published studies, strikes out against those who, for whatever reason, can’t replicate results, or deletes public critiques of his arguments. Secrecy without explanation or justification is antithetical to the open discourse central to the scientific process. Openness is essential to seeking out the truth, and secrecy about published results leaves outsiders to wonder about the reason for all the fog.