# Open reviews

### December 27, 2020 — Open review of preprint by Xu and colleagues: PubPeer

Xu, Y., Vignali, L., Collignon, O., Crepaldi, D., & Bottini, R. (2020). Brain network reconfiguration for narrative and argumentative thought. bioRxiv. DOI

Xu and colleagues use intersubject correlation (ISC and ISFC) analyses to compare stimulus-locked responses and functional network configuration between narrative and argumentative stimuli. The distinction between discourse types is supported by behavioral ratings of the stimuli. They show that narrative stimuli preferentially engage the default-mode network, whereas argumentative stimuli increase coupling between frontoparietal and language areas. Overall, the paper is well-written and the figures look good. I outline some concerns about the overall framing of the results, suggest a more direct comparison between the conditions of interest, and have a couple questions about the methods.

The psychological argument that complex thoughts boil down into “two natural kinds”—narrative and argumentative thought—doesn’t seem very compelling to me. The authors point to work by James and Bruner (which I like a lot), but is there any more contemporary empirical evidence supporting this distinction? Why should we believe these are “natural kinds,” rather than just different ends of a discourse spectrum, or different regions of a more complex discourse space? For example, it seems that argumentative texts might sometimes intersperse narrative bits for the sake of explication or keeping the reader engaged (and vice versa); would such a hybrid text still be considered argumentative? This coarse-grained dichotomy doesn’t seem to really respect the complexity of actual discourse. (Also makes me wonder whether modern text analysis or topic modeling methods could better capture the finer-grained flow between narrative and argumentative modes over the course of a single text.)

Following on the previous comment, my main concern with the manuscript is whether there could be some less-interesting “third variable”—such as how “engaging” or “interesting” or “personally relevant” the stimulus is—driving the distinction between the two types of stimuli. If I were to look at the ISC results for these four stimuli outside the context of the authors’ narrative-vs-argumentative framing, my interpretation would simply be: “yes, listening to Calvino is much more engaging than listening to Pinker” (at least in my personal experience, and especially for non-linguists!). This is a very difficult problem to “solve,” though, as narrative discourse may be generally more engaging than argumentative discourse in natural contexts. Can the authors address this concern? For example, I think the behavioral analysis of qualities like “concreteness,” “self-projection,” etc is great, but I don’t see any questions about how engaging participants found the stimulus, or how much they enjoyed reading it. I worry that these metrics would also be higher for narrative texts, which complicates the story. (I also wonder if the authors have any other evidence of engagement; e.g. wakefulness, pupil dilation, etc).

A side note to my first comment is that the jump from (a) two distinct psychological constructs for narrative and argumentative thought to (b) two distinct neural substrates for narrative and argumentative thought seems fairly weak. There doesn’t seem to be much work pointing to a brain-based distinction, so it seems like the authors are playing a balancing act of trying to effectively introduce this potential distinction, and, at the same time, pointing to references to support a preexisting distinction. The authors’ approach to framing this potential distinction as an empirical question (i.e. content-dependent hypothesis vs content-independent hypothesis) makes sense to me in lieu of strong preexisting empirical evidence for a brain-based distinction.

It seems that one of the core comparisons here would be between the intact narrative ISCs and the intact argumentative ISCs. Specifically, I would want to see that the variance between the two intact discourse types is greater than the variance among the two exemplar stimuli within each discourse type (i.e. 2 x 2 ANOVA). However, I don’t see this direct comparison anywhere. The authors focus on various contrasts (e.g. intact > scramble), but it would be helpful to see the (pre-contrast) intact ISC maps. This simpler sort of analysis could better set the reader up to interpret the more complex logic of the contrast analysis (which follow the form of “the difference between intact and scrambled versions of narrative stimuli is greater than the difference between intact and scrambled versions of argumentative stimuli”).

Following on the previous comment, a difficult problem with a design like this is convincing readers that this result will generalize to different exemplar stimuli from the narrative and argumentative modes. With this in mind, I think the analysis of consistency across stimulus exemplars within a discourse type could benefit from a more formal treatment. For example, a correlation between the unthresholded intact ISC maps for each exemplar within a discourse type (or ICC?) would provide a more quantitative measure of consistency than just eyeballing the (thresholded) maps in Figure S4 and S5. (And again, in something like Figure S3, S4, or S5 it would be nice to see the pre-contrast intact ISC maps for all four stimuli).

I’m curious why we don’t see any real difference in ISC maps between intact and scrambled argumentative stimuli (line 146, 155, Figure 2). I can imagine a scenario where: (a) narrative stimuli require the ordered sequence of events where B -> A -> C to be comprehensible and a sentence-scrambled version like C -> A -> B is nonsensical; while (b) argumentative stimuli are structured such that A -> B -> C and B -> A -> C are effectively identical. Is this the case with these stimuli? Or could this be a result of playing the scrambled stimuli before the intact stimuli during the scanning session? More discussion of why the ISC maps for intact and scrambled argumentative stimuli are so similar would be helpful.

I didn’t fully understand the method for mitigating confounds described at lines 468–473 (maybe because I’m not very familiar with ICA-AROMA). As far as I understand, you use ICA-AROMA to obtain both signal and motion components. Then you use a GLM to fit these components to the original BOLD signal, which yields regression coefficients (betas). Then you subtract(?) the betas(?) for the motion components from the… what? I’m confused about the sentence “We estimated the beta coefficients using the fitglm function in Matlab 2019a and subtracted the motion-relevant terms from the BOLD signal.” Do you mean you subtract the motion-related IC time series from the BOLD time series? In that case, what’s the role of the betas? Is there a precedent in the literature for using this “non-aggressive” approach to confounds, and why would we want to “preserve the shared variance between the motion-relevant components and the signal components” (which could simply be due to motion)? Then you regress out the mean time series for CSF and white matter? I’m confused about the distinction between “subtracting the terms(?) from the signal” and the more usual approach of regressing out confound signals (i.e. fitting a regression equation with confound variables and retaining the residual time series for further analysis). In any case, intersubject correlation analyses are fairly robust to idiosyncratic noise, so I doubt the details of confound mitigation make a huge difference here—but I’d like to better understand what’s going on. (I would also be cautious about multi-stage filtering in your approach; Lindquist et al., 2019.)

For contrasting paired ISC values, a permutation test randomly shuffling the group assignments may be a more sensitive (and appropriate) test than using a bootstrap hypothesis test. (Or, equivalently, randomly flipping the signs of the paired contrast across subjects at each permutation; 2^16 possible permutations). My reasoning here is that this sort of permutation test would accommodate the paired structure of the samples whereas the bootstrap does not. I’m also a bit unsure about allowing for a left-out subject to be randomly sampled into the average of N – 1 subjects in a bootstrap sample (line 525). This might make your bootstrap distribution positively skewed (i.e. higher than the original mean leave-one-out ISC), which would result in an overly stringent null distribution (although I’m not sure it would affect contrasts the same way). Same for the ISFC analysis. Note that in our BrainIAK implementation of the bootstrap hypothesis test for leave-one-out ISC, we resample the ISC values themselves (i.e. after computing leave-one-out ISC) rather than resampling the subjects, then computing ISC (https://github.com/brainiak/brainiak/blob/master/brainiak/isc.py#L650).

When reading the Results, it wasn’t obvious to me that subjects were listening to an auditory recording of the stimulus rather than reading a text version of the stimulus (didn’t realize this until line 409).

Line 173: I would expand this “SES” acronym when you initially introduce it (and point to reference 60)

Lines 242, 255, 582: “insignificant” > “nonsignificant”

Line 269: “stimuli-evoked” > “stimulus-evoked”

Line 275: “The argumentative” > “Argumentative”

Line 276: “the functional couplings” > “functional coupling”

Line 285: “after” > “for”

Line 306: “simultaneously featured by two factors” sounds odd to me

Line 326: “yeo” > “Yeo”

Line 327: “unite” > “unit”

Line 359: “that failed to be segmented” > “that segmentation failed”

Line 371: “initialing” > “initial”

Line 548: “The averaged the” > “The averaged”

Line 550: “pair-wised” > “pairwise”

Figure S7: “fsaverag5” > “fsaverage5”

Figure S7: “metric-to-volune” > “metric-to-volume”

I encourage the authors to share their code (e.g. via GitHub). This seems like a very rich naturalistic dataset, and I hope the authors plan to share it publicly (e.g. via OpenNeuro).

### References:

Lindquist, M. A., Geuter, S., Wager, T. D., & Caffo, B. S. (2019). Modular preprocessing pipelines can reintroduce artifacts into fMRI data. Human Brain Mapping, 40(8), 2358–2376. DOI