# Open reviews

## March 29, 2018 — Open review of preprint by Pallarés and colleagues: review

Pallarés, V., Insabato, A., Sanjuan, A., Kuehn, S., Mantini, D., Deco, G., & Gilson, M. (2018). Extracting orthogonal subject- and behavior-specific signatures from fMRI data using whole-brain effective connectivity. bioRxiv, 201624. DOI

Pallarés and colleagues describe a method for classifying subjects and tasks (rest or movie-viewing) based on effective connectivity between brain regions, and demonstrate that this outperforms similar approaches using generic correlation-based functional connectivity. The manuscript is methodologically sophisticated and dense. My main concerns are about the readability of the manuscript. Some small organization changes could make this more comprehensible for mere mortals. I provide some suggestions about this, as well as some specific minor comments. The figures and data visualization are very nice. Overall, I think this is a timely, high-quality manuscript, and is suitable for publication.

The “Results first, Methods last” organization of the manuscript was difficult for me to follow. I found myself writing down many questions only to have them answered 40 minutes later after many pages. Lots of flipping back and forth, and many readers won’t take more than a single pass through the paper. Assuming the authors want to keep this organization, I would suggest interspersing a few sentences in the Results section to guide the narrative.

I wouldn’t so explicitly trash the kNN algorithm—it’s also a classic machine learning algorithm, and can perform surprisingly well in many situations despite its simplicity. Also, arbitrarily setting k = 1 may cripple the algorithm, as k is a hyperparameter that could be chosen using, e.g., nested cross-validation. I understand you’re making a comparison to existing reports, and am not suggesting any changes to your analyses—just suggesting to not set it up as an inherently inferior algorithm.

I’m a little confused about your use of MLR and RFE. Do you first use MLR without any feature elimination, then use RFE to find the best features? (I spent some time wondering whether you were using RFE for all of the MLR results.) With regard to Figure 6, you plot the 57 most discriminative links for subject classification. Are all 100 links plotted in the background? This could benefit from a little more hand-holding for the reader.

I would consider changing the title to “Extracting orthogonal subject- and task-specific signatures…” to be consistent with your use of “task” (or condition) within the manuscript, and because describing rest and movie-watching as behaviors may ruffle some feathers.

In abstract, what does “the different modalities” refer to?

Line 145: Are the “WSS and BSS distributions” (shown in Fig. 2B) comprised of the WSS and BSS correlation values in the S? This was not obvious to me on the first pass through. Is there any concern that the diagonal WSS blocks contribute many fewer correlations to the distribution than the much more numerous off-diagonal BSS cells of S? Presumably this doesn’t violate any assumptions of KS distance? Might be best to explicitly mention this just to assuage less-informed readers (like me).

Table 2: Why does FC0 differentiate WSS and BSS so much worse than corrFC? I would have expected these to be similar.

Line 197: What do Finn et al., 2015, 2017, and Vanderwal et al., 2017 do that is not “trustworthy”? I would describe what they do, and then how your approach is an improvement on their approach.

Figure 3B and 3C. How are the standard deviations (colored areas) computed? And are the 100 repetitions cross-validation folds? Is this standard deviation across cross-validation folds?

Figure 3: At the end of the caption you have “test test”.

Line 229: “information [is] conveyed”

Figure 4B: The average RFE ranking colorbar is a little ambiguous to me. Does a rank near 0 indicate a group of areas that most contributes to classifier performance or least contributes?

Figure 4D: Having more tick marks, e.g., at intervals of 200 or 500, rather than only two exponential notation tick marks would make this easier to read.

Line 743: Change “multilinear” to “multinomial” to be consistent with terminology used throughout.

You might include a sentence making it more obvious how your MLR implementation mediates the high dimensionality of all links.