June 29, 2019 — Open review of preprint by DuPre and colleagues:
DuPre, E., Hanke, M., & Poline, J. (2019). Nature abhors a paywall: how open science can realize the potential of naturalistic stimuli. PsyArXiv.
DuPre and colleagues provide a timely open science perspective on the logistical challenges for data (and stimulus!) sharing in naturalistic neuroimaging. They highlight tools developed to address these challenges, and provide a useful index of current shared naturalistic imaging datasets. In general, I think the article will make a great contribution to the special issue. My main suggestion is that I think the article (coming from authoritative authors on this topic) could be a little more prescriptive. Three important audiences for this piece are: (1) labs collecting naturalistic data and considering sharing; (2) labs looking to analyze shared naturalistic data; (3) institutional leadership. All three would benefit from any concrete recommendations as to best practices the authors would be willing to make. I provide a handful of comments and suggestions—to be considered at the authors’ discretion given the limited scope of the article.
Regarding fair use: We should also encourage our institutions (universities, funding agencies) to advocate for more permissive fair use legislation in the case of non-profit scientific research (e.g., automated feature extraction, which in many cases does not even involve viewing the stimulus). The commercial industries (and the legislative bodies financially beholden to them) that fight back are impeding scientific progress for what are, in many cases, negligible profits. Might also be worth spending a sentence or two describing what falls under the scope of fair use and what boilerplate a researcher should use when sharing stimuli under fair use.
In your discussion of the success of studyforrest.org, you might emphasize that dedicated data collection and curation will be a critical step toward scalable neuroscience (as it is in, e.g., physics and astronomy). Also relevant to incentives, persistent identifiers for data, and the credit assignment problem for data generators; e.g., https://www.nature.com/articles/d41586-019-01715-4.
You have a column for Data License in the table. It might be worthwhile to spend a sentence or two explaining what these licenses mean, if there’s any meaningful distinctions, etc. Most importantly, I’d make some minimal recommendation to folks who are considering sharing their naturalistic imaging data as to what license to use. You might consider including an approximate N number of subjects per dataset in the table (maybe in the “stimulus duration” column). Some datasets have relatively short stimuli and large samples of subjects (e.g., Cam-CAN) and vice versa.
I think your paragraphs on the “choice of stimuli” undersells how interesting this question is. For example, unedited non-cinematic clips of “reality” are dramatically less engaging (and yield weak ISCs; Hasson et al., 2008, 2010). On the other hand, Hollywood-style movies are kind of a caricature of the real world and incorporate a century of cinematic tricks to maintain engagement, guide attention, and manipulate the viewer’s beliefs. There’s a fundamental tension between selecting or “designing” naturalistic stimuli/paradigms to probe specific scientific questions and generalization. May also be worthwhile to design stimuli/paradigms to facilitate out-of-sample prediction (as in, e.g., Huth et al., 2012). In general, I think your taxonomy of (three) strategies for selecting stimuli could be made a little more compelling.
When I was looking through the table, I thought “Is it worth including a column with links to these datasets?” Might be good to indicate in the caption that the dataset names link to their OpenNeuro locations. (Could also cite one of the OpenNeuro papers for hosting the data.)
Minor comments, typos, and suggested re-wording:
Pointer to Chris Gorgolewski’s tweet suggesting additional datasets to be added to your table: https://twitter.com/chrisgorgo/status/1147542414216892417?s=20; as well as Seeliger et al., 2019.
Line 8: We use naturalistic as… > We use the term “naturalistic” as most…
Line 9: “podcasts” seems pretty specific, might change to e.g. “spoken stories or webcasts”
Line 20: Typo “datasetseven” > “datasets, even”
Line 57: “narrativesperhaps” > “narratives, perhaps”
Line 58: “re-annotationsthe” > “re-annotation, the”
Line 85: “to tools including” > “for tools such as”
Line 115: “stimuli offers” > “stimuli offer”
Baldassano et al. 2018: “Real-World” > “real-world”
Chen et al., 2016: “Real-World” > “real-world”; “High-Order” > “high-order”
Etchells, 2017: Extra comma/space in reference.
Lettieri et al., 2019: Is this on a preprint server? Or missing journal.
Shafto et al., 2014, Taylor et al., 2017: Capitalization of “Cambridge Centre” etc. Add journal to Taylor reference.
Van Essen et al., 2013: Check capitalization.
Wehbe et al., 2014: This should have the PLOS One journal information.
Zadbood et al., 2017: Cerebral Cortex journal information.
Hasson, U., Landesman, O., Knappmeyer, B., Vallines, I., Rubin, N., & Heeger, D. J. (2008). Neurocinematics: the neuroscience of film. Projections, 2(1), 1–26.
Hasson, U., Malach, R., & Heeger, D. J. (2010). Reliability of cortical activity during natural stimulation. Trends in Cognitive Sciences, 14(1), 40–48.
Huth, A. G., Nishimoto, S., Vu, A. T., & Gallant, J. L. (2012). A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron, 76(6), 1210–1224.
Seeliger, K., Sommers, R. P., Güçlü, U., Bosch, S. E., & van Gerven, M. A. J. (2019). A large single-participant fMRI dataset for probing brain responses to naturalistic stimuli in space and time. bioRxiv, 687681.