Open reviews

March 11, 2024 — Open review of Zhang and colleagues:

Zhang, Y., Fan, L., Jiang, T., Dagher, A., & Bellec, P. (2024). Connectome-constrained neural decoding reveals a representational hierarchy from perception to cognition to action. bioRxiv. DOI


Zhang and colleagues use biologically-constrained (i.e. by anatomical or functional brain atlases) graph neural networks (BGNNs) to embed task-related fMRI data. They show that the resulting BGNN representations can be used to decode cognitive tasks, predict behavioral responses, and capture interpretable cortical topographies. The approach seems interesting and I generally believe the results, but I find it difficult to understand (methodologically) what the authors have done. The Main Text is a bit too concise, to the point where it becomes hard to both (a) follow the narrative, and (b) understand how the analyses were implemented—partly because so much has been relegated to the supplement. I’m also concerned whether the claims of “state-of-the-art” are really justified.

Major comments:

My main comment is that I found it hard to follow the narrative of the results described in the Main Text. I understand that perhaps journal formatting requirements preclude the standard structure of a methods section, but I think readers need a bit more hand-holding to really appreciate the results. First of all, I think it would be helpful to add a few more explanatory sentences early on to make very explicit what task(s) the fMRI subjects are performing, what the model receives as input, how it is trained, and what exactly it predicts—I think you can say this very concisely in a couple sentences to help the reader get the gist, then keep most of the implementation details in the supplement. Second, some of the language is very general (e.g. referring to many different tasks), whereas the main text figures focus on particular examples. I found it hard to match up the more general claims in the text to the corresponding main text figures and supplementary figures. Third, for each section of the results, it would be helpful to more explicitly set up or motivate the upcoming result; e.g. one sentence each to address “what is the specific question this section will address?” and “how did the authors methodologically address this question?”. I provide some more specific examples of points of confusion in the following comments.

In the second paragraph of the main text, I think it would be worthwhile to add a sentence or two briefly describing (with appropriate citation) how a “graph neural network” works and how you’re leveraging it here. I’m not an expert in GNNs, and I worry that GNNs will just seem like “magic” to other non-expert readers (similarly for “graph convolutions”). You could unpack this a bit more later in the text as well, but I felt like I wanted a stronger motivation for why GNNs might be an important and qualitative improvement over previous methods (and not just a fancy new tool).

On page 9, when you extend “the time window to a longer duration (e.g., 10s)”, you mean you’re refitting a new model estimated from multiple volumes following each trial, rather than just a single volume? I would consider re-writing this section of dynamics to make more clear exactly what the model takes as input (a single volume, at delay? multiple volumes, across what range?).

The authors make claims about the model revealing a hierarchy of neural representations in rather broad terms, but my understanding of the example in Figure 2 is that the different layers of the model essentially just recovers the structure of the working memory task: face vs house (or place, tool) and 0-back vs 2-back. It was hard for me to distill how well this generalizes based on sifting through the supplementary results.

Relatedly, in the section on “multilevel neural representations”, you introduce the notion of “modules” without any real explanation of what you mean by this term or how you arrive at the modules. I would add a sentence or two here describing what you mean by modules so the reader doesn’t have to search through the supplement to understand the main text.

At the beginning of page 9, it feels like the cerebellum kind of comes out of nowhere. My first thought was “did these lines get mistakenly copied in from a different paper?” Is this section on the cerebellum really more important than all the other material that’s already been relegated to the supplement? (It doesn’t make any appearance in the main text figures as far as I can tell…) If you’re going to explore differences between brain atlases combined with cerebellar atlases, I would try to set this up a little bit more clearly in the narrative of the results. Some of the language in this section is also a bit unclear; for example, “the decodability of cerebellar atlases” reads as if you’re trying to decode different atlases, rather than examine how decodability of cognitive tasks differs across cerebellar atlases. I would consider splitting this off into a separate paragraph and adding a sentence or two to introduce and motivate this particular subset of analyses.

We need more details about the cross-validation procedure. You mention 5-fold cross-validation in the abstract and at one other point in the text. I assume you mean cross-validation across subjects (e.g. train on 960 subjects, test on 240 subjects for each fold)?

The authors make strong claims that their model achieves “state-of-the-art” performance on a variety of tasks in the abstract and throughout the text—but state-of-the-art relative to what? I see comparisons of decoding performance using the authors’ BGNN representations to SVM decoding on raw fMRI data, task-related functional connectivity, and GLM contrast maps in Figure 2C—but I don’t think any of these really correspond to the previous “state-of-the-art”. For example, I don’t think the authors make any comparison to other hyperalignment methods (e.g. Feilong et al., 2021). In the machine learning world, people can reasonably claim “state-of-the-art” when there are well-validated benchmarks and authors can either report or reproduce the performance of other published models. Unfortunately in our field it seems more difficult to say what “state-of-the-art” would really mean. I don’t think this really undermines the authors’ results; I just think it would probably be best to rein in or remove the term “state-of-the-art” to avoid unnecessarily overhyping the current (strong) results.

On a related note, in Figure 2C, it’s hard to tell if the authors’ method meaningfully outperforms SVM decoding on GLM contrast maps without error bars or some statistical test. I don’t see any statistical comparison between the authors’ method and the baseline methods on pages 8 or 9.

Minor comments:

Page 7, line 26: Guntupalli et al., 2018 also comes to mind when using connectivity to estimate a shared brain space.

Page 7, line 28: “Few attempts have tried to align task-evoked brain responses across subjects and to preserve subject-specific characteristics of cognitive processes at the same time.” I’m not sure what you intend to say with this sentence. I think this sentence describes precisely the goal of the hyperalignment methods you cite in the previous sentences—i.e. there’s a large body of work dedicated to this problem. I would consider rephrasing this to say something along the lines of “it’s not clear how best to introduce anatomical priors into this process yet.”

Page 7, line 41: “proposes a brain-atlas-constrained” > “proposes brain-atlas-constrained”

Page 8, line 30: “state-of-art” > “start-of-the-art”

Page 8, line 48: “impact on the decoding” > “impact the decoding”

Page 8, line 50: “neuroinformation” —is this a commonly-used word? (maybe just “neural features”)

Page 9, line 20: “remain stable by using” > “remained stable when using”

Page 9, line 28: “nearly accurate” > “highly accurate”

Figure 1: I found this figure a bit hard to follow; the high level of abstractness may lead to some confusion. A couple examples: I don’t really understand what the light blue arrow pointing from panel B scatter plot (right) to the panel C decoding phase (left). There are also a lot of different font sizes and formats here; probably best to standardize the labels a bit.

References:

Feilong, M., Guntupalli, J. S., & Haxby, J. V. (2021). The neural basis of intelligence in fine-grained cortical topographies. eLife, 10, e64058. DOI

Guntupalli, J. S., Feilong, M., & Haxby, J. V. (2018). A computational model of shared fine-scale structure in the human connectome. PLOS Computational Biology, 14(4), e1006120. DOI