May 15, 2020 — Open review of preprint by Papale and colleagues:
Papale, P., Leo, A., Handjaras, G., Cecchetti, L., Pietrini, P., & Ricciardi, E. (2019). Shape coding in occipito-temporal cortex relies on object silhouette, curvature and medial-axis. bioRxiv, 814251.
Papale and colleagues apply variance partitioning to a searchlight representational similarity analysis in order to localize unique and shared contributions of five visual models to cortical object representation. Three models capture shape according to silhouette, curvature, and medial-axis “skeletal” structure, and models of inked area and category identity are included as well. I was invited to review this paper after an initial round of review, and have (private) access to the initial reviews and the authors’ response. The authors seem to have substantially improved the manuscript in response to the first round of review, and have also added several helpful supplementary figures. The manuscript is already at a relatively high level. My central comments address a couple issues of statistical interpretation that should be made more explicit in the manuscript. I also offer a handful of wording suggestions to improve clarity.
The authors should be careful about their use of the term “orthogonal.” As far as I understand, “unique” contributions recovered by commonality analysis are not strictly orthogonal in the statistical sense (i.e. uncorrelated; see e.g. Creager, 1971). I think using the term “orthogonal” more loosely when thinking conceptually about the “unique” and “shared” variance accounted for by each model is probably fine—but I think it would be useful to include a sentence with the caveat that these unique portions of variance are not strictly orthogonal in a statistical sense. (The authors can correct me if I’m misunderstanding the statistics here.) In general, I think this variance-partitioning (or commonality analysis) approach to RSA (as in e.g. Groen et al., 2012, 2018; Ramakrishnan et al., 2014; Greene et al., 2016; Hebart et al., 2018) is very fruitful, and I expect we’ll see more of it in the future.
The central analyses apply variance partitioning to the group-average RDMs (and there is some precedent for this in the literature, e.g. Groen et al., 2018). However, it should be made clear that this treats subject (and stimulus) as a fixed effect and does not license statistical inference to the population. Although fixed-effects analyses are somewhat atypical, I think it’s fine as long as authors make sure that their interpretations match the analysis and state explicitly in the manuscript that this is a fixed-effects analysis so that readers correctly interpret the results. (Alternatively, the authors could perform subject-level commonality analysis, yielding R2 values per subject which could be submitted to a statistical test for population inference—I suspect this would yield qualitatively similar results.)
The supplementary figures are present in my (private) copy of the response to reviews and on Zenodo, but I don’t see any reference to them in the actual manuscript text (on my private copy from the journal or on bioRxiv). Maybe this is just a versioning issue? In any case, I think these figures are very useful, and the authors should make sure they are adequately described (the current captions are very brief) and provide pointers to them in the main text.
I’m trying to understand the reasoning behind the category identity RDM with relation to the stimulus set. The object identity RDM doesn’t really take into account semantic relationships between categories (e.g. animacy, real-world size), but does group different exemplars (sometimes with different orientations) into the same category. Importantly, this RDM demarcates the shared category membership between visually similar yet different stimuli, e.g. fish, teapot, and “fish-shaped teapot.” The motivation for this relies heavily on the experimental logic behind using a stimulus set like this—I had a hard time understanding it without looking at the actual stimuli in Figure S7. I think readers could benefit from a bit more explanation of the structure (e.g. exemplars per category, crucial for understanding the rows/columns of the RDMs) and reasoning behind the stimulus set and what it entails for the identity model.
In certain parts of the manuscript (e.g. Figure 4), the authors make fairly specific anatomical claims about topographic cortical organization. However, all of the analyses are performed using searchlights where the result for any given searchlight is mapped to the center voxel of that searchlight, which introduces some unavoidable imprecision in localization. These are 6 mm radius searchlights with 2 mm isotropic voxels yields (probably ~120 voxels per searchlight depending on the sphere implementation). So, for example, values for searchlight centers near the collateral sulcus will aggregate information from voxels elsewhere, potentially including voxels nearer to the occipitotemporal sulcus. This caveat should be noted in the manuscript.
I would encourage the authors to make the fMRI data and stimuli publicly available in the future (e.g. on OpenNeuro) so that other groups can test different models and different analysis techniques against this dataset. For example, it would be straightforward and interesting to apply the RSA-based partitioning approach within predefined ROIs (e.g. Wang et al., 2015) rather than searchlights; and surely there are many other object recognition and shape models that may provide additional insights. Ideally, the code used for this project should also be made publicly available (e.g. on GitHub).
News & Noteworthy: I would be careful with language like “Which shape description is encoded in our brain and thus really mirrors the way we perceive shapes?” Seems like there may be many intermediate representations of shape in the brain that are not very directly related to how we psychologically perceive shape (except perhaps under peculiar task demands).
Page 3, line 45: “modulating its radial” > “modulating radial”
Page 3, line 45: see “News & Noteworthy” comment; the “truly” is a bit grating
Page 3, line 62: “attended” > “viewed” (since attention was presumably directed toward the fixation task)
Figure 1 caption, E: “control” > “controls”
Functional MRI data acquisition: Make sure you’re not missing any “mandatory” acquisition parameters as outlined in the COBIDAS report: http://www.humanbrainmapping.org/files/2016/COBIDASreport.pdf
Page 9, line 172: “3dtshift” > “3dTshift”
Page 9, line 177: I don’t understand what you mean by “linear (FLIRT) and nonlinear registration (FNIRT) to MNI152 standard space.” Do you analyze the results of both linearly and nonlinearly normalized data? Or is there a linear alignment step followed by a nonlinear alignment step? Or do you mean the anatomical is linearly aligned to the functional images, and also nonlinearly aligned to the MNI template? (This might just be my ignorance of FSL pipelines)
Page 10, line 193: “Pearson’s rho” should be “Pearson’s r” because you’re using the sample statistic r (rho is typically reserved for the population statistic)
Page 11, line 223: I don’t understand “partial correlation coefficient” here in the context of commonality analysis. Are you actually computing a partial correlation? Or are you converting from the unique and shared R-squared values from the commonality analysis?
Page 11, line 227: What is randomized in the TFCE analysis? Is it flipping the sign on the regression coefficients? TFCE is providing nonparametric cluster-level inference, but what does “minimum cluster size = 10 voxels” mean? Is this an arbitrary cluster-extent threshold imposed after TFCE to clean up isolated “significant” voxels here and there? Make sure these things are explicitly stated.
Page 12, line 239: I might begin this sentence with something like “To evaluate whether orthogonality varies from posterior to anterior visual areas, …” just so the reader immediately understands why you’re talking about XY slices here.
Page 13, line 261: The “collinearity” observed here is a function of both the models and this particular stimulus set (which may or may not be “representative” of objects in the world)
Page 13, line 269: “mildly” seems like an odd word choice
Page 21, line 390: reword “assessed”
Page 21, line 392: “independently” > “uniquely”
Page 23, line 438: reword “While this may apparently result in contrast”
Creager, J. A. (1971). Orthogonal and nonorthogonal methods for partitioning regression variance. American Educational Research Journal, 8(4), 671–676.
Greene, M. R., Baldassano, C., Esteva, A., Beck, D. M., & Fei-Fei, L. (2016). Visual scenes are categorized by function. Journal of Experimental Psychology: General, 145(1), 82–94.
Groen, I. I., Ghebreab, S., Lamme, V. A., & Scholte, H. S. (2012). Spatially pooled contrast responses predict neural and perceptual similarity of naturalistic image categories. PLoS Computational Biology, 8(10), e1002726.
Groen, I. I., Greene, M. R., Baldassano, C., Fei-Fei, L., Beck, D. M., & Baker, C. I. (2018). Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior. eLife, 7, e32962.
Hebart, M. N., Bankson, B. B., Harel, A., Baker, C. I., & Cichy, R. M. (2018). The representational dynamics of task and object processing in humans. eLife, 7, e32816.
Ramakrishnan, K., Scholte, H. S., Groen, I. I., Smeulders, A. W., & Ghebreab, S. (2015). Visual dictionaries as intermediate features in the human brain. Frontiers in Computational Neuroscience, 8, 168.
Wang, L., Mruczek, R. E., Arcaro, M. J., & Kastner, S. (2015). Probabilistic maps of visual topography in human cortex. Cerebral Cortex, 25(10), 3911–3931.