Open reviews
September 18, 2024 — Open review of Reddan and colleagues:
Reddan, M., Ong, D., Wager, T., Mattek, S., Kahhale, I., & Zaki, J. (2023). Neural signatures of emotional inference and experience align during social consensus. Research Square. DOI
Reddan and colleagues develop whole-brain models that predict (a) the observer’s inferences about a target’s emotional intensity—as conveyed by naturalistic emotional storytelling video clips—and (b) the target from the clip’s ratings of their own emotional intensity. They show that these neural signatures are dissociable, but converge as the observer’s ratings more closely approximate the target’s own ratings. The use of naturalistic stimuli to more effectively elicit affective processing is a major selling point. I think this manuscript is already in relatively good shape. The writing is concise and generally clear. The work is methodologically sophisticated, technically sound, and I generally believe the results. As this paper has already been through a thorough round of review, I hope that my review can point to some final steps to improve the clarity of the writing. I wrote most of this review prior to reading reviews/responses from the prior round of review; it seems that previous reviewers keyed into more theoretical and interpretational issues (reverse inference, the “ground truth” of intent ratings, etc) and the authors have adequately addressed those concerns. Most of my comments are clarification questions and suggestions for making the paper more accessible to a broad readership.
Major comments:
My top-level comment is that on my first readthrough, I had a very difficult time understanding what data (both fMRI and behavioral) are being extracted from the in-scanner task and ultimately submitted to the whole-brain decoding/LASSO-PCR model. I understand (1) that participants in the scanner were viewing clips of other people and dynamically rating the emotional intensity, and (2) that whole-brain coefficient maps are predictors in a regression model mapping onto a target variable comprising either the subject’s ratings of observed emotional intensity (“inference”) or the person in the video’s own ratings (“intent”)—but it was very unclear to me how part 1 gets translated into part 2. For example, some questions that occurred to me as I was reading: How are these dynamic ratings being aggregated over the course of a clip? Is the LASSO-PCR model taking as input a single coefficient map for each participant, or is there somehow more dynamic information there? Most of these questions ultimately became clear, but it took a good bit of hopping back and forth between the Methods and Results. Given that (a) Figure 1 doesn’t depict in very much detail how the data from the task are transformed into inputs for the model, and given (b) the Results-before-Methods organization of the text, I would consider trying to clarify this upfront, as explicitly as possible. For example, you could put an additional paragraph at the very beginning of the Results section that succinctly describes the task structure, what time points are going to the univariate GLM, how the ratings are coerced into a target variable, and how all of this gets channeled into a whole-brain decoding model. Something like this would have been helpful for me to more clearly understand and interpret the results.
Following on the previous comment, particularly for a journal with the Results-before-Methods structure intended for a general readership, I would really encourage the authors to narratively hold the reader’s hand while working through the Results. For example, in the first paragraph of the results, you report your initial result (with LOO-CV), then, out of nowhere(!), mention generalization to a “held-out validation set”… but there is no previous mention of validation sets! This left me asking “Wait, how is this different from the LOO-CV result?”, and then I had to dig through the Methods to figure out where this validation set was coming from. You could mention this in an “introductory” paragraph of the Results section, as I mentioned in the previous comment, or at least spend one sentence to set the reader up to actually interpret this result. Another example, in the exploratory and final section of the results, you mention: “This activity is dissociable from maps of finger tapping in the NeuroSynth database”—this also seemed to come out of nowhere without any explicit prior motivation as to why this comparison might be important. I ultimately got the reasoning (especially after reading the Discussion), but I would still add a set-up sentence here to better situate the result.
My second-most top-level comment is that I felt like I was left wondering about these ratings for much of the paper. How many ratings did each subject supply? How similar were the inference ratings to the intent ratings? Did subjects meaningfully vary in how accurate their ratings were? Are we really justified in abstracting intensity away from valence and binning into five levels? How were the dynamic ratings aggregated across time or stimuli or subjects? How were these ratings ultimately transformed into the target variable(s)? Again, I think we eventually get most of the answers to these questions deeper in the Methods section, but it sure is difficult to feel confident in my understanding of the Results while these questions are piling up in my head. Some additional description of the ratings early on would be helpful.
I know the whole-brain decoding model is the main focus here, but the structure of the univariate GLM was a bit unclear to me, which is important as the coefficients serve as the input to the whole-brain regression model. I’m not sure whether it’s worth incorporating Figure S3 into the main text somehow, or whether it would be sufficient to just describe the GLM a bit more clearly at the end of the Introduction or the beginning of the Results.
On page 18, can you provide a little more explanation of what the “PC” part of LASSO-PCR is doing? I assume this just means the sparse features learned by LASSO are reduced to principal components prior to the mapping onto the target variable? Is there some hyperparameter here for the number of PCs (maybe the default in the CANLab toolbox)? Are the weight maps constructed by projecting the PC weights back onto voxels? (Sorry, this might just be a reflection of my ignorance about how these models are typically used.)
Can we get a little more detail about the nonparametric statistical procedures mentioned on page 18? What is being permuted in the permutation test? You randomize the relationship between participants and scores—is that correct? In the bootstrap test, for each voxel, you resample subjects with replacement to populate a bootstrap distribution around the mean coefficient? Then you subtract the mean coefficient from the bootstrap distribution to shift it to zero and perform a significance test? For example, on page 19, you say “bootstrapped and thresholded”, but bootstrapping does not natively give you a null distribution, so I assume you’re doing some kind of bootstrap hypothesis test (e.g., Hall & Wilson, 1991)? Best to spell all this out more explicitly.
I would try to be super clear about what is happening at an individual-subject level and what is happening across subjects. For example, my understanding is that the LASSO-PCR analysis will yield a single set of weights across all (training) participants (for a single cross-validation fold)—i.e., you don’t end up with single-subject weight maps. However, later in the results, you run an SVM on “subject-level intent and inference maps”, which I ultimately puzzled out must be coming from from your within-subject univariate GLM—but since the univariate GLM isn’t very clearly described early on, this was hard to follow.
Minor comments:
Page 4: “beta maps” feels like jargon
Page 6: I’m not sure you need this full definition of cosine similarity in here, especially since you already have it in the Methods (page 19).
Page 6: “bran” > “brain”
Page 6: I would briefly (or parenthetically) mention what kind of cross-validation is used with the SVM here
Page 7: “z = -2.71”—any particular reason this z-value is negative? I assume it’s arbitrary based on what you subtracted from what
Page 8: “quickly and accurately”—I’m not sure how “quickly” is justified here(?), but maybe I’m missing something about the dynamics of the task
Page 8: “highly dynamic”—how do you mean?
Page 9: “observer’s” > “observers’” (plural possessive)
Page 18: What does “OLS partial r” mean? My understanding is that this is just a correlation between the model-predicted and actual target values (e.g. intent scores; for your left-out test samples). Or is this some kind of “partial” correlation? If so, what other variable(s) are getting partialed out?
Page 19: “across published literature” > “across the published literature”
Figure 4 caption: “bran” > “brain”
References:
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757–762. DOI