Monday, December 15, 2014

Response from Ballesteros et al to my HI-BAR

Update 12-15-14: I used strikethru to correct a couple of the F test notes below. The crossed out ones were fine.

Update 5-26-15: Frontiers has published a correction from Ballesteros et al that acknowledges the overlap among their papers. It doesn't admit the inappropriateness of that overlap. It mostly echoes their response below, but does not address my questions about that response.

In late November, I posted a HI-BAR review of a paper by Ballesteros et al (2014) that appeared in Frontiers. In addition to a number of other issues I discussed there and in an earlier review of another paper, I raised the concern that the paper republished the training data and some outcome measures from an earlier paper in PLoS One without adequately conveying that those data had been published previously. I also noted this concern in a few comments on the Frontiers website. On my original HI-BAR post, I asked the original authors for clarification about these issues. 

I have now received their response as a pdf file. You can read it here

Below I quote some sections from their reply and comment on them. I encourage you to read the full reply from the original authors as I am addressing only a subset of their comments below. For example, their reply explains the different author lists on the two papers in a satisfactory way, I think. Again, I would be happy to post any responses from the original authors to these comments. As you will see, I don't think this reply addresses the fairly major concerns about duplication of methods and results (among other issues).

Quotes from the authors' reply are indented and italicized, with my responses following each.

As you noted, both papers originated from the same randomized controlled intervention study (clinical trial number: NCT02007616). We never pretended that the articles were to be considered as two different intervention studies. Part of this confusion could have been generated because in the Frontiers article the clinical trial number did not appear on the first page, even though this number was included in the four preliminary versions of the manuscript under revision. We have contacted Frontiers asking them to include the clinical trial number in the final accepted version, if possible. 
Although that would help, it's not sufficient. The problem is that the data analyses are republished. It would be good to note, explicitly, both in the method section and in the results section, that this paper is reporting outcome measures from the same intervention. And, it's essential to note when and where analyses are repeated.
If it is not possible at this stage, we asked them to publish a note with this information and to acknowledge in the results section the overlap as well as in Figure 3b, mentioning that the data in the Figure were published previously in PLoS.
This seems necessary regardless of whether or not the link to the clinical trial number is added. Even if it is made clear that the paper is based on the same intervention, it still is not appropriate to republish data and results without explicitly noting that they were published previously. Actually, it would be better not to republish the data and analyses. Period.
You also indicated in your first comments posted in Frontiers that the way we reported our results could give the impression that they come from two different and independent interventions. To avoid this misunderstanding, as you noticed, we introduced two references in our Frontiers ́ article. We inserted the first reference to the PLoS article in the Method section and a second reference in the Discussion section. Two references that you considered were not enough to avoid possible confusions in the reader.
As I discussed in my HI-BAR post, these citations were wholly inadequate. One noted only that the results of the oddball task were reported elsewhere, yet the same results were reported again in Frontiers and results section included no mention of the duplication. The other citation, appearing in the discussion, implied that the earlier paper provided additional evidence for a conclusion about the brain. Nowhere did the Frontiers paper cite the PLoS paper for the details of the intervention, the nature of the outcome measures, etc. It just published those findings as if they were new. The text itself should have noted, both in the method and results sections, whenever procedures or results were published elsewhere. Again, it would have been better not to republish them at all.
In relation to the results section in which we describe the cross-modal oddball attention task results in the Frontiers article, we acknowledge that, perhaps, it would have been better to avoid a detailed presentation of the attentional results that were already published and indicate just that attentional data resulting from the task developed in collaboration with the UBI group were already published. We could have asked the readers to find out for themselves what happened with attention after training. We were considering this possibility but in the end we decided to include the results of the oddball task to facilitate ease of reading.
Acknowledging the repetition explicitly in the text would have helped avoid the interpretation that these were new results. Repeating the statistics from the earlier paper isn't an "ease of reading" issue -- it's duplicate publication. You could easily summarize the findings of the earlier paper, with clear citation, and note that the details were described in that paper. I don't see any justification for republishing the actual statistics and findings. 
Regarding the last paragraph of the oddball results section, we said “New analyses were conducted....” As we said above, we tried (perhaps in an unfortunate way) to give in this paper a brief summary of the results obtained in the attention task, so this paragraph refers to the epigraphs “Distraction” and “Alertness” of results section in PLoS publication, where differential variables were calculated and new analyses were performed to obtain measures of gains and losses in the ability to ignore infrequent sounds and to take advantage of the frequent ones. Once again, we apologize if this sentence has led to confusion, and we are in contact with the Journal concerning this.
Yes, that sentence added to the impression these analyses were new to the Frontiers paper. But, the statistical analyses should not have been duplicated in the first place. That's also true for the extensive repetition of all of the training data.
Another comment in your exhaustive review referred to the differences in the samples shown in the Consort diagram between the two publications. This has a simple explanation. The diagram of the PLoS article refers only to the cross-modal oddball task designed to assess alertness and distraction while the Frontier ́s Consort diagram refers to the whole intervention study. In the control group, one of the participants was removed due to the large number of errors in the oddball task, but he was included in the rest of the study in which he reached inclusion criteria. The same occurred in the trained group. As attentional measures were analyzed separately by the UBI group, by the time we sent the results only fifteen participants completed the post-evaluation (we could not contact a participant and the other was travelling that week). A few days later, these two participants completed the post-evaluation, but we decided not to include them in the PLoS sample as the data were already analyzed by the UBI group.
Thank you for these clarifications. I'm curious why you decided not to wait for a few days for those remaining participants if that was part of your intervention design. If their results came in a few days later, why not re-do the analysis in the original paper to include all participants who were in the intervention. Presumably, by publishing the PLoS paper when you did, you deemed the intervention to be complete (i.e., it wasn't flagged as a preliminary result). It seems odd to then add these participants to the next paper. This difference between papers raises two questions. First, would the results for the PLoS paper have been different with those two participants? Second, would the results of the Frontiers paper have differed if they had been excluded? And, if those data were in by the time the Frontiers paper was written, why were these participants not included in the oddball analysis? At least that would have added new information to those duplicated analyses.

This clarified information should have been included in the Frontiers paper to make it clearer that both papers reported the same intervention with the same participants.
We would like to explain the clinical trial registration process. As you pointed out in your comments to Mayas et al. (2014), we registered the clinical trial after the attention manuscript was submitted to PLoS. The journal (PlosOne) specifically required the registration of the intervention study as a clinical trial during the revision process in order to publish the manuscript, and told us about the possibility of post-registration. The Editor of PLoS sent us the link to register the study.
Post-registering a study completely undermines the purpose of registration. I find it disturbing that PLoS would permit that as an option for a clinical trial. Looking at the PLoS guidelines, they do make an exception for post-study registration provided that the reasons for "failing to register before participant recruitment" are spelled out clearly in the article (emphasis from original on PLoS website): 
PLOS ONE supports prospective trial registration (i.e. before participant recruitment has begun) as recommended by the ICMJE's clinical trial registration policy.Where trials were not publicly registered before participant recruitment began, authors must:
  • Register all related clinical trials and confirm they have done so in the Methods section
  • Explain in the Methods the reason for failing to register before participant recruitment
It's also clear that their policy is for clinical trials to be pre-registered, not post-registered. And, the exception for post-registration requires a justification. Neither the PLoS paper nor the Frontiers paper provided any such justification. The Frontiers paper didn't mention registration at all, and the PLoS one didn't make clear that the registration occurred after submission of the finished study. 

The idea of registration is to specify, in advance, the procedures that will be followed in order to avoid introducing investigator degrees of freedom. Registering after the fact does nothing to address that problem. It's just a re-description of an already completed study. It's troubling that a PLoS editor would instruct the authors to post-register a study.
We would like to clarify some questions related to the data analysis. First, multiple tests in all ANOVAs were Bonferroni corrected although it is not made explicit in the text. 
As best I can tell, the most critical multiple testing issues, the ones that could have been partly remedied by pre-registration, were not corrected in this paper. 

There are at least four distinct multiple testing issues:

  1.  There are a large number of outcome measures, and as best I can tell, none of the statistical tests were corrected for the number of tests conducted across tasks. 
  2. There are multiple possible tests for a number of the tasks (e.g., wellbeing has a number of sub-scales), and there don't appear to have been corrections for the number of distinct ways in which an outcome could be measured. 
  3. A multi-factor ANOVA itself constitutes multiple tests (e.g., a 2x2 ANOVA involves 3 tests: each main effect and the interaction). Few studies correct for that multiple testing problem, and this paper does not appear to have done so. 
  4. There is a multiple tests issue with pairwise comparisons conducted to follow-up a significant ANOVA. I assume those are the Bonferroni-corrected tests that the authors referred to above. However, it's impossible to tell if these tests were corrected because the paper did not report the test statistics — it just reported p < xx or p= xx. Only the F tests for the main effects and interactions were reported.
If the authors did correct for the first three types of multiple testing issues, perhaps they can clarify. However, based on the ANOVA results, it does not appear that they did.

A related issue, one mentioned in my HI-BAR on the PLoS paper but not on the Frontiers paper, is that some of the reported significance levels for the F tests are incorrect. Here are some examples from the Frontiers paper in which the reported significance level (p<xx or p=xx) is wrong. For each, I give the correct p value in red (uncorrected for multiple tests). None of these calculations led to less statistical significance:

  • [F(1, 28) = 3.24, MSE = 1812.22, p = 0.057, η2p = 0.12]. p=.0826
  • [F(2, 50) = 5.52, MSE = 0.001, p < 0.005, η2p = 0.18]. p=.0068
  • [F(1, 28) = 4.35, MSE = 0.09, p < 0.001, η2p = 0.89]. p=.0462
  • [F(1, 28) = 17.98, MSE = 176.74, p < 0.001, η2p = 0.39]. p=.0002
  • [F(1, 28) = 13.02, MSE = 61.49, p < 0.01, η2p = 0.32]. p=.0012
  • [F(1, 28) = 3.42, MSE = 6.47, p = 0.07, η2 = 0.11]. p=.0750
The following two results were reported as statistically significant at p<.05, but actually were not significant with that alpha level:
  • [F(1, 28) = 3.98, MSE = 0.06, p < 0.01, η2p = 0.12]. p=.0559
  • [F(1, 28) = 3.40, MSE = 0.15, p = 0.04, η2p = 0.10]. p=.0758
I don't know that any of these errors dramatically changes the interpretation of the results, but they should be corrected.
Second, the RTs filtering was not an arbitrary decision. The lower limit (200 ms) reflects the approximate minimum amount of time necessary to respond to a given stimulus (due to speed limitations of the neurological system). RTs below this limit may reflect responses initiated before the stimulus onset (guessing). The selection of the upper limit (1100 ms) is a more difficult decision. Different criteria have been proposed (statistical, theoretical...) in the literature (see Whelan, 2008). Importantly none of them seem to affect type I errors significantly. In this case, we made the following theoretical assumption: RTs longer than 1100 ms might depend on other cognitive processes than speed of processing.
There is nothing wrong with this choice of cutoffs, and it might well have been principled. Still, it is arbitrary. Any number of cutoffs would have been just as appropriate (e.g., 150ms and 1200ms, 175ms and 3000ms, ± 3SD, etc). My point wasn't to question the choice, but instead to note that it introduces flexibility to the analysis. This is a case in which pre-registering the analysis plan would help -- the choice of cutoffs is reasonable, but flexible. Registration eliminates that flexibility, making it clear to readers that the choice of cutoffs was principled rather than based on knowing the outcome with different cutoffs. Another approach would be to assess the robustness of the results to various choices of cutoff (reporting all results).