Open reviews
December 15, 2024 — Open review of Ke and colleagues:
Ke, J., Song, H., Bai, Z., Rosenberg, M. D., & Leong, Y. C. (2023). Predicting emotional arousal during naturalistic movie-watching from whole-brain dynamic connectivity across individuals and movie stimuli. bioRxiv. DOI
Ke and colleagues develop a predictive model mapping from dynamic functional connectivity to behavioral ratings of arousal during naturalistic movie-viewing. They show that this model generalizes across individual brains (leave-one-subject-out cross-validation) and, importantly, across quite different movies (different characters, events, themes, etc). I’m receiving a version of this manuscript from the journal that has already been reviewed and revised at least once; for example, it seems that the authors have removed a less-convincing non-connectivity-based predictive model based on the previous round of reviews. That seems fine to me; I read the current version of the paper prior to looking at the reviews/responses and it seemed coherent and complete on its own. Overall, the manuscript is well-written, the figures are excellent, the modeling and statistical analyses seem rigorous, and the authors include a number of useful control analyses to reinforce their conclusions. In sum, I think this manuscript is already in a very good place and essentially ready for publication. I provide a handful of relatively minor comments and suggestions below.
My main comment is that I think the authors should be careful about how they set up and frame their work in terms of “individual-specific” neural signatures or “generalization across individuals”. For example, in the Introduction (page 3), the authors write “One possibility is that these representations are specific to each situation and each individual” and ultimately contrast their findings with this possibility. But the authors are mapping individual brain activity onto a Y variable that’s a consensus measure of arousal collected in a separate sample of subjects. This setup is not designed to find individual-specific neural signatures of arousal, because you don’t have individual-specific behavioral ratings collected in the very same fMRI subjects “in the moment” while they watch the films. Put differently: in this setup, you either find a signature that generalizes across subjects or you don’t; you would need behavioral ratings in the scanning subjects if you wanted to provide positive evidence for individual-specific neural signatures of arousal. I think the authors’ current approach is perfectly fine and interesting enough in its own right; I would just encourage them to be careful in the writing not to inadvertently set up the reader to expect individual-specific results.
In setting up the predictive modeling results (page 8), you state “In every round of cross-validation we selected FCs that significantly correlated with arousal in the training set as features.” I suspect this line will tend to raise red flags for readers, whether justified or not. If I understand correctly, this feature selection strategy is fair (not double-dipping) because the edges are selected in the training set (N – 1 subjects); maybe just say this more explicitly? Or if there’s a precedent in the field for doing feature selection in this way, maybe cite that as well? I suspect the implication that each training fold will use a slightly different set of features may bother people (although these feature sets should be highly correlated due to the similarity between training sets in leave-one-out cross-validation). Anyway, even just a single additional sentence here could preempt readers from becoming suspicious.
I’m still not sure I fully “get” why the within-dataset null distributions (e.g. Figure 3) are so high. In general, I think the statistical approach is sound and this positively shifted null distribution will only make the tests more conservative. But any extra effort to build an intuition for why this is the case up front would be appreciated.
I’m curious whether the authors have tried using intersubject functional connectivity (ISFC; Simony et al., 2016) rather than within-subject functional connectivity to drive their predictive model. ISFC would presumably isolate the stimulus-driven component of connectivity (maybe a good thing?), but might effectively filter out arousal signals that are not very precisely synchronized to the stimulus (maybe a bad thing?), and I suspect would ultimately reduce the generalizability of the model across different stimuli. Anyway, I understand this would complicate things and I don’t think it’s necessary for the current manuscript—just wondering if the authors have any intuitions or anecdotal evidence on how ISFC would affect their model.
In Figure 2, it looks like the error bands around the mean arousal/valence ratings are fairly consistent across time. I’m wondering: have the authors looked at time points of the film where variance in ratings is low versus high? Again, I don’t think this is necessary for the current manuscript—just curious.
I appreciate the effort the authors have put into making their code and models publicly available—that’s great! (Looking forward to seeing the behavioral rating data released publicly as well.)
Page 4: you say “30 in each condition”—at first I was confused about what “conditions” you were referring to, but now I understand you mean 2 movies (Sherlock, FNL) x 2 ratings (arousal, valence); you might want to articulate conditions sooner or adjust the wording
Figure 3 caption: “for within-dataset (the left panel) and between-dataset (the right panel)” feels like it needs another noun at the end (e.g. “cross-validation”) or just take out the “for”
References:
Simony, E., Honey, C. J., Chen, J., Lositsky, O., Yeshurun, Y., Wiesel, A., & Hasson, U. (2016). Dynamic reconfiguration of the default mode network during narrative comprehension. Nature Communications, 7, 12141. DOI