Thursday, December 10, 2015

Visual effort and inattentional deafness

Visual Effort and Inattentional Deafness

Earlier this week I was asked for my thoughts on a new Journal of Neuroscience paper: 
Molly, K., Griffiths, T. D., Chait, M., & Lavie, N. (2015). Inattentional deafness: Visual load leads to time-specific suppression of auditory evoked responses. Journal of Neuroscience, 35, 16046-16054.doi: 10.1523/JNEUROSCI.2931-15.2015
In part due to a widely circulated press release, the paper has garnered a ton of media coverage, with headlines like:
Focusing On A Task May Leave You Temporarily Deaf: Study

Did You Know Watching Something Makes You Temporarily Deaf?

Study Explains How Screen Time Causes 'Inattentional Deafness'

The main contribution of the paper was a link between activation in auditory cortex and the behavioral finding of reduced detection of a sound (a brief tone) when performing a more difficult visual task. 

This brain-behavior link, not the behavioral result, is the new contribution from this paper. Yet, almost all of the media coverage has focused on the behavioral result which isn't particularly novel. That's unsurprising given that most of the stories just followed the lede of the press release, which was titled:
"Why focusing on a visual task will make us deaf to our surroundings: Concentrating attention on a visual task can render you momentarily 'deaf' to sounds at normal levels, reports a new UCL study funded by the Wellcome Trust"

Here are a few points about this paper that have largely been lost or ignored in the media frenzy (and the press release):

1. The study did not show that people were "deaf to their surroundings." In the study (Experiment 2), people performed an easy or hard visual task while also trying to detect a tone that occurred on half of the trials. When performing the easy visual task, they reported the tone accurately on 92% of the trials. When performing the harder visual task, they reported it accurately on 88% of trials. The key behavioral effect was a 4% reduction in accuracy on the secondary, auditory task when the primary visual task was harder.  In other words, people correctly reported the tone on the vast majority of trials even with the hard visual task. That's not deafness. It's excellent performance of a secondary task with just a slight reduction when the primary task is harder. 

Aside: much of that small effect on accuracy could be due to a difference in response bias between the conditions (Beta of 3.2 compared to 1.3, a difference reported as p = 0.07 with an underpowered study of only 11 subjects).

2. The behavioral effect of visual load on auditory performance is not original to this paper. In fact, it has been reported by the same lab.

3. A number of other studies have demonstrated costs to detection in one sensory modality when focusing attention on another modality. This paper is not the first to show such a cross-modal effect. See, for example, hereherehereherehere (none of which were cited in the paper). Many other studies have shown that increasing primary task difficulty decreases secondary task performance. Again, the behavioral result touted in the media is not new, something the press release acknowledges in passing.

4. The study doesn't actually involve inattentional deafness; the term is misused. Inattentional deafness or blindness refers to a failure to notice an otherwise obvious but unexpected stimulus when focusing attention on something else. The "unexpected" part is key to ensuring that the critical stimulus actually is unattended (the justification for claiming the failure is due to inattention); people can't allocate attention to something that they don't know will be there. 

In this study, tone detection was a secondary task. People were asked to focus mostly on the visual task, but they also were asked to report whether or not a tone occurred. In other words, people were actively trying to detect the tone and they knew it would occur. That's not inattentional deafness. It's just a reduction in detection for an attended stimulus when a primary task is more demanding. And, as I noted above, it's not really a demonstration of deafness either given participants were really good at detecting the tone in both conditions (they were just slightly worse when performing a harder visual task). 

Note that the same lab previously published an paper that actually did show an effect of visual load on inattentional deafness.

ConclusionThere's nothing fundamentally wrong with this paper, at least that I can see (I'm not an expert on neuroimaging, though). The link between the behavioral results and brain imaging results is potentially interesting. I would have preferred a larger sample size and ideally measuring the link between brain and behavior in the same participants performing tasks with the same demands, but those issues aren't show stoppers. I can see why it is of interest to specialists (like me). That said, I'm not sure that it makes a contribution of broad interest to the public, and the novelty and importance of the behavioral result has been overplayed.

Monday, November 30, 2015

HI-BAR: A gold standard brain training study?

A gold-standard brain training study?
Not without some alchemy

A HI-BAR (Had I Been A Reviewer) of: 
Corbett, A., Owen, A., Hampshire, A., Grahn, J., Stenton, R., Dajani, S., Burns, A., Howard, R., Williams, N., Williams, G., & Ballard, C. (2015). The effect of an online cognitive training package in healthy older adults: An online randomized controlled trial. JAMDA, 16(11), 990-997.

Edit 12-3-15: The planned sample was ~1 order of magnitude larger than the actual one, not 2. (HT Matthew Hutson in the comments)

A recent large-scale brain training study, published in the Journal of the American Medical Directors Association (JAMDA), has garnered a lot of attention. A press release was picked up by major media outlets, and a blog post by Tom Stafford on the popular Mind Hacks blog called it “a gold-standard study on brain training” and noted that “this kind of research is what ‘brain training’ needs.”*

Tom applied the label “gold standard” because of the study’s design: It was a large, randomized, controlled trial with an active control group and blinding to condition assignment. From the gold-standard monicker, though, people might infer that the research methods and results provide solid evidence for brain training benefits. They do not.

Tom's post identified several limitations of the study, such as differential attrition across conditions and the use of a self-report primary outcome measure. Below I discuss why these and other analysis and reporting problems undermine the claims of brain training benefits. 

Problems that undermine interpretability of the study

Differential Attrition 
The analysis was based on the 6-month testing point, but the study was missing data from about 70% of the participants due to attrition. To address this problem, the authors carried forward data from the final completed testing session for each participant and treated it as if it were from the 6-month point. Critically, the control group had substantially greater attrition than the intervention groups—more of their scores were carried forward from earlier points in the intervention.

For the control group, only 27% of the data for the primary outcome and 17% of the data for the secondary outcomes came from participants who actually completed their testing at 6 months. For the Reasoning group, those numbers were 42% and 40%. For the General Cognition group, they were 40% and 30%.

The extent of the differential attrition and rates of carrying forward results from earlier sessions were only discoverable by inspecting the Consort diagram. This analysis choice and its implications were not fully discussed, and the paper did not report analyses of participants with comparable durations of training. This analysis approach introduces a major confound that could entirely account for any differential benefits.

Unclear sample sizes and means
Tables 3 and 4 list different control group means next to each training condition. There was only one control group, so it is unclear why the critical baseline means differed for the two training interventions. Without knowing why these means differed (they shouldn't have), the differential improvements in the training groups are uninterpretable.

The Ns listed in the tables also are inconsistent with the information provided in the Consort diagram. In a few cases, the Tables list a larger N than the consort diagram, meaning that there were more subjects in the analysis than in the study.

I emailed the corresponding author (on Nov. 10 and Nov. 23) to ask about the each of these issues, but I received no response. I also emailed the second author. His assistant noted that the corresponding author's team was "was responsible for that part of the study" and said the second author "can be of no help with this." I’m hoping this post will prompt an explanation for the values in th
e tables.

For me, those reporting and analysis issues are show stoppers, but the paper has other issues.

Other issues

Limitations of the pre-registration
The study was pre-registered, meaning that the recruiting, testing methods, and analysis plans were specified in advance. Such pre-registrations are required for clinical trials, but they have been relatively uncommon in the brain training literature. Have a pre-registered plan is ideal because it eliminates much of the flexibility that otherwise can undermine the interpretability of findings. The use of pre-registration is laudable. But, the registration was underspecified and the reported study deviated from it in important ways. 

For example, the protocol called for 75,000 - 100,000 participants, but the reported study recruited fewer than 7000. That’s still a great sample, but it’s 2 orders an order of magnitude smaller than the planned sample. Are there more studies resulting from this larger sample that just aren’t mentioned in the pre-registration?

The study also called for a year of testing, but it had to be cut short at 6 months and more than 2/3 of the participants did not undergo even 6 months of training. The pre-registration did not include analysis scripts, and the data from the study do not appear to have been posted publicly.

The pre-registered hypotheses predicted greater improvements in the reasoning training group than the general cognition group and it predicted that the general cognition group would not outperform the control group. The paper reports no tests of this predicted difference.

Underreporting for the primary measure (IADL)
The primary outcome measure consisted of self-reports of performance in daily activities (known as the Instrumental Activities of Daily Living or IADL). As Tom's post noted, such self-reports are subject to demand characteristics — people expect to do better following training, so they report having done better. The study did not test for different expectations across the training and control groups, so the benefits could be due to such demands or to a differential placebo effect (e.g., the control group might have found the study less worthwhile).

The reported benefits for IADLs were small, and the data provide little evidence for any benefit of training. The study reported statistically significant benefits for both training groups relative to the control group, but statistical significance is not the same as evidence. With samples this large, we should expect a substantially lower p-value than .05 when an effect actually is present in the population. If the Ns and means reported in the table were consistent with the method description, it might be possible to compute a Bayes Factor for these analyses. My bet is that the difference between the training groups and the control group would provide weak evidence at best for a meaningful training benefit (relative to the null).

The paper provides no information about baseline scores on the primary outcome measure (IADL). Although the analyses control for baseline scores, training papers must provide the pre-test scores and post-test scores. Without doing so, it is impossible to evaluate whether apparent training benefits resulted in part from baseline differences.

The paper also states that “Data from interim time points also show significant benefit to IADL at 3 months, particularly in the GCT group, although this difference was not significant.” I take this to mean both training groups outperformed the control group at 3 months, but they did not differ significantly from each other. No statistical evidence is provided in support of this claim.

Limited evidence from the secondary measures
Only one of the secondary cognitive outcome measures (a reasoning measure) showed a training benefit. The paper refers to it as “the key secondary measure,” but that designation does not appear in the pre-registration ( Moreover, the pre-registration predicts better performance for reasoning training than general cognition training or the control group, but the paper found improvements for both interventions. A few other measures showed significant effects, but given the large sample sizes, the high p-values might well be more consistent with the absence of a training benefit than the presence of one.

Despite providing no statistical evidence of differential benefits for Reasoning training and General Cognition training, the paper claims that “Taken together, these findings indicate that the ReaCT package confers a more generalized cognitive benefit than the GCT at 6 months. That claim appears to come from finding no effect on a digit secondary task in the Reasoning group and a decline in the General cognition group. However, a difference in significance is not a significant difference.

Almost all of the measures showed declining performance from the pre-test to the post-test. That is, participants were not getting better. They just declined less than the control participants. It is unclear why we should see such a pattern of declining performance over a short time window with relatively young participants. Although cognitive performance does decline with age, presumably those declines should be minimal over 1-6 months, and they should be swamped by the benefits of taking the test twice. One explanation might be differential attrition -- those subjects who did worse initially were more likely to drop out early. 

* Thanks to Tom Stafford for emailing a copy of the paper. The journal is obscure enough that the University of Illinois library did not have access to it.