November 9, 2020 — Open review of preprint by Maffei and Sessa:
Maffei, A., & Sessa, P. (2020). Event-related network changes unfold the dynamics of cortical integration during face processing. bioRxiv.
Maffei and Sessa use several graph-theoretic metrics to describe functional network properties of EEG data acquired while participants viewed face (and animal) images. In general, I think this is a straightforward application of these methods and will be of interest to those working on the time course of face processing. I raise a couple interpretational and statistical concerns, followed by minor comments and suggestions on wording.
My main concern is interpretational: some of the core results of this paper (at least statistically) hinge on the comparison between the face and animal stimuli/task. This puts certain limitations on the generalizability of these findings. For example, using the term “non-face” stimuli is a bit misleading here because we’re not licensed to generalize these results beyond the animal control (papers on item analysis in neuroimaging come to mind; e.g. Bedny et al., 2007 Westfall et al., 2016, Yarkoni, 2019). The two tasks being compared are qualitatively different. For the face stimuli, participants are effectively performing an emotional expression discrimination task; for the animal stimuli, the participants are discriminating animal species (with some nonexistent horse-cow hybrid at intermediate stages). The two stimulus classes also have very different low-level visual properties as well; e.g. the faces are cropped and round, whereas the animals have elongated protrusions on a white background. I realize this isn’t really something that can be “fixed”—it was a core design choice from the beginning. This concern only affects certain conclusions; for example, simply showing that the modularity metric dips over the time course of face processing doesn’t rely on the animal contrast. But I think the authors can be more careful in interpreting results that stem from statistical differences between the two stimulus types.
Are there behavioral data alongside the EEG data? For example, at line 10 you justify the stimuli/tasks as “minimizing the confounds related to having different levels of difficulty between the experimental conditions.” Is behavioral performance roughly similar between the two stimuli/tasks?
The permutation-based tests described in the “Statistical Analysis” section weren’t entirely clear to me. What is actually being permuted here? Are you randomizing the time series? How does the permutation test aggregate across subjects and does it license group-level inference? For example, in the case of assessing the significance of a difference between face and animal conditions, I would expect the within-subject condition assignments to be shuffled to generate the appropriate null distribution. Maybe this is my ignorance of certain conventions in the field, but this terminology of maximum T, p-value, and cluster size reported in the results also wasn’t clear. Are these t-values in the thousands? …seems odd.
As far as I understand, the Granger causality analysis is statistically evaluated using an F-test at each node. However, it wasn’t clear to me if any correction was applied for multiple univariate tests. Maybe FDR is appropriate here? You say “Performing the test in both directions, allowed us to identify only nodes showing a significant ( = .05) Granger-causality”—can you expand on the reasoning here?
In Figure 2, are the electrodes inside the contour lines the significant ones? Also, make explicit that this contrast is face > animal. There are several different color scales used for the scalp maps here; I appreciate that the effects are similar across metrics and thresholds, but I’m wondering if it might be more informative to use one scale so we can compare panels more easily.
In Figure 5, why do we observe flat modularity for the animal stimuli? Based on the leftmost 5% panel, it may interact with the threshold. This worries me a bit because it suggests there’s something fundamentally different about the face and animal tasks—but the face results are largely interpreted with reference to the animal task.
First paragraph of the Introduction is very long; I would consider breaking this up a bit.
Page 5, line 50: “prevents to achieve” > “precludes”
Page 5, line 54: “techniques has been” > “techniques, it has been”
Page 6, line 8: reword “mazy” (convoluted?)
Page 6, line 40: “is nevertheless” > “it is nevertheless”
Page 7, line 12: I would change “the evolution of the network” to something like “the temporal evolution or responses in the network” just to make clear what you mean by “evolution”
Page 7, line 24: Unsure what you mean by “the knowledge of the state of the network”
Page 9, line 38: Really only 6 male out of 34 subjects?
Page 9, line 60: “ending to” > “ending at”
Page 10, line 3: “step” > “steps”
Page 12, line 5: “derived computing” > “derived by computing”
Page 14, line 17: “has nonetheless” > “nonetheless has”
Page 14, line 24: “influent” > “influential”
Page 14, line 28: Any justification for the particular damping value of .85? Is there a precedent for this? Did you try other values?
Page 15, line 27: “statistics” > “statistic”
Page 15, line 33: “networks” > “network”
Page 22, line 9: Make clear that these >0 values (e.g. 0.2–0.4) are in the direction of a random network (not a lattice).
Page 26, line 3: Is there some precedent in the literature for considering -0.5 to 0.5 as the range for “small-worldness”?
Page 26, line 8: Are you referring to small-world topology popularized in resting-state fMRI? Seems like quite a stretch (in terms of time-scale, physiological basis) from rs-fMRI to EEG.
Bedny, M., Aguirre, G. K., & Thompson-Schill, S. L. (2007). Item analysis in functional magnetic resonance imaging. NeuroImage, 35(3), 1093–1102.
Westfall, J., Nichols, T. E., & Yarkoni, T. (2016). Fixing the stimulus-as-fixed-effect fallacy in task fMRI. Wellcome Open Research, 1, 23.
Yarkoni, T. (2019). The generalizability crisis. PsyArXiv.