Open data
The “Narratives” fMRI dataset for evaluating models of naturalistic language comprehension
Ethel Franklin Betts (1908), from The Orphant Annie Book, by James Whitcomb Riley (wikimedia)
OpenNeuro ds002345 OpenNeuro
DataLad
DOI
The “Narratives” collection aggregates fMRI datasets acquired over the course of seven years (2011–2018) while participants listened to spoken stories. In aggregate, participants listened to 27 diverse stories ranging from ~3 to ~56 minutes for a total of ~4.6 hours of unique audio stimuli. The collection currently includes 345 unique subjects participating in a total of 891 functional scans with accompanying anatomical data. Data are organized into a machine-readable format according to the BIDS standard with exhaustive metadata derived from the original DICOMs. Anonymized subject labels are linked across sessions and include demographic and behavioral variables including age, gender, condition, and comprehension score. Auditory stimuli are included in the dataset for non-commercial scholarly research—principally feature extraction—under fair use or fair dealing provisions. The data collection amounts to over 350,000 functional volumes of story-listening fMRI data and accompanying stimuli, totaling 6.4 days. The scripts used to collate and process these data are available at the GitHub repository. Slides for a presentation of this dataset at SfN 2019 are available on Google Slides. The public data release is accompanied by a data descriptor paper currently in preparation. If you find this dataset useful, please cite the following:
Nastase, S. A., Liu, Y.-F., Hillman, H., Zadbood, A., Hasenfratz, L., Keshavarzian, N., Chen, J., Honey, C. J., Yeshurun, Y., Regev, M., Nguyen, M., Chang, C. H. C., Baldassano, C., Lositsky, O., Simony, E., Chow, M. A., Leong, Y. C., Brooks, P. P., Micciche, E., Choe, G., Goldstein, A., Vanderwal, T., Halchenko, Y. O., Norman, K. A., & Hasson, U. (2021). The “Narratives” fMRI dataset for evaluating models of naturalistic language comprehension. Scientific Data, 8, 250. DOI
PDF
This dataset has been (re-)analyzed in the following publications:
Botch, T. L., & Finn, E. S. (2024). Neural representations of concreteness and concrete concepts are unique to the individual. Journal of Neuroscience, e0288242024.
DOI
Yazin, F., Majumdar, G., Bramley, N., & Hoffman, P. (2024). Fragmentation and multithreading of experience in the default-mode network. bioRxiv.
DOI
Chang, C. H., Nastase, S. A., Zadbood, A., & Hasson, U. (2024). How a speaker herds the audience: multi-brain neural convergence over time during naturalistic storytelling. Social Cognitive and Affective Neuroscience.
DOI
Ye, Z., Ai, Q., Liu, Y., de Rijke, M., Zhang, M., Lioma, C., & Ruotsalo, T. (2024). Generative language reconstruction from brain recordings. Research Square.
DOI
Yang, X., O’Reilly, C., & Shinkareva, S. V. (2024). Embracing naturalistic paradigms: substituting GPT predictions for human judgments. bioRxiv.
DOI
Kumar, S.*, Sumers, T. R.*, Yamakoshi, T., Goldstein, A., Hasson, U., Norman, K. A., Griffiths, T. L., Hawkins, R. D., & Nastase, S. A. (2024). Shared functional specialization in transformer-based language models and the human brain. Nature Communications, 15, 5523.
DOI
Li, H., Mei, K., Liu, Z., Ai, Y., Chen, L., Zhang, J., & Ling, Z. (2024). Refining self-supervised learnt speech representation using brain activations. arXiv.
DOI
Kobo, O., Yeshurun, Y., & Schonberg, T. (2024). Reward-related regions play a role in natural story comprehension. iScience, 27(6), 109844.
DOI
Janssen, J., Gallego, A. G., Díaz-Caneja, C. M., Lois, N. G., Janssen, N., González-Peñas, J., Macias, P., Buimer, E., van Haren, N., Arango, C., Kahn, R., Pol, H. H., & Schnack, H. (2024). Heterogeneity of morphometric similarity networks in health and schizophrenia. bioRxiv.
DOI
Yin, C., Ye, Z., & Li, P. (2024). Language reconstruction with brain predictive coding from fMRI data. arXiv.
DOI
Ye, Z., Zhan, J., Ai, Q., Liu, Y., de Rijke, M., Lioma, C., & Ruotsalo, T. (2024). Query augmentation by decoding semantics from brain signals. arXiv.
DOI
Li, J. (2024). On the shape of brainscores for large language models (LLMs). arXiv.
DOI
Tikochinski, R., Goldstein, A., Meiri, Y., Hasson, U., & Reichart, R. (2024). Incremental accumulation of linguistic context in artificial and biological neural networks. bioRxiv.
DOI
He, S., Guan, Y., Cheng, C. H., Moore, T. L., Luebke, J. I., Killiany, R. J., Rosene, D. L., Koo, B.-B., & Ou, Y. (2023). Human-to-monkey transfer learning identifies the frontal white matter as a key determinant for predicting monkey brain age. Frontiers in Aging Neuroscience, 15, 1249415.
DOI
He, Z., & Toyoizumi, T. (2023). Causal graph in language model rediscovers cortical hierarchy in human narrative processing. arXiv.
DOI
Oota, S. R., Agarwal, V., Marreddy, M., Gupta, M., & Bapi, R. S. (2023). Speech taskonomy: which speech tasks are the most predictive of fMRI brain activity? In Interspeech 2023 (pp. 5167–5171).
DOI
Yin, C., Yu, Q., Fang, Z., He, J., Peng, C., Lin, Z., Shao, J., & Li, P. (2023). Data contamination issues in brain-to-text decoding. arXiv.
DOI
Schmälzle, R., Liu, H., Delle, F. A., Lewin, K. M., Jahn, N. T., Zhang, Y., Yoon, H., & Long, J. (2023). Moment-by-moment tracking of audience brain responses to an engaging public speech: replicating the reverse-message engineering approach. Communication Monographs, 91(1), 31–55.
DOI
Song, H., Shim, W. M., & Rosenberg, M. D. (2023). Large-scale neural dynamics in a shared low-dimensional state space reflect cognitive and attentional dynamics. eLife, 12, e85487.
DOI
Hahamy, A., Dubossarsky, H., & Behrens, T. E. J. (2023). The human brain reactivates context-specific past information at event boundaries of naturalistic experiences. Nature Neuroscience.
DOI
Xi, N., Zhao, S., Wang, H., Liu, C., Qin, B., & Liu, T. (2023). UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 13277-13291).
link
Bethlehem, R. A., Seidlitz, J., White, S. R., Vogel, J. W., Anderson, K. M., Adamson, C., … & Schaare, H. L. (2022). Brain charts for the human lifespan. Nature, 604(7906), 525–533.
DOI
Liu, X., Zhou, M., Shi, G., Du, Y., Zhao, L., Wu, Z., Liu, D., Liu, T., & Hu, X. (2023). Coupling artificial neurons in bert and biological neurons in the human brain. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 7, pp. 8888–8896).
DOI
Caucheteux, C., Gramfort, A., & King, J. R. (2023). Evidence of a predictive coding hierarchy in the human brain listening to speech. Nature Human Behaviour, 7, 430–441.
DOI
Oota, S. R., Marreddy, M., Gupta, M., & Bapi, R. S. (2023). How does the brain process syntactic structure while listening? In Findings of the Association for Computational Linguistics: ACL 2023 (pp.6624–6647). Association for Computational Linguistics.
DOI
Chang, H. C. C., Nastase, S. A., & Hasson, U. (2022). Information flow across the cortical timescales hierarchy during narrative construction. Proceedings of the National Academy of Sciences, 119(51), e2209307119.
DOI
Oota, S. R., Gupta, M., & Toneva, M. (2023). Joint processing of linguistic properties in brains and language models. In Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., & Levine, S. (Eds.), Advances in Neural Information Processing Systems 36 (pp. 18001–18014).
link
Dufumier, B., Grigis, A., Victor, J., Ambroise, C., Frouin, V., & Duchesnay, E. (2022). OpenBHB: a large-scale multi-site brain MRI data-set for age prediction and debiasing. NeuroImage, 263, 119637.
DOI
Caucheteux, C., Gramfort, A., & King, J. R. (2022). Deep language algorithms predict semantic comprehension from brain activity. Scientific Reports, 12, 16327.
DOI
Thomas, A. W., Ré, C., & Poldrack, R. A. (2022). Self-supervised learning of brain dynamics from broad neuroimaging data. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (pp. 21255–21269). Curran Associates, Inc.
link
de la Vega, A., Rocca, R., Blair, R. W., Markiewicz, C. J., Mentch, J., Kent, J. D., Herholz, P., Ghosh, S. S., Poldrack, R. A., & Yarkoni, T. (2022). Neuroscout, a unified platform for generalizable and reproducible fMRI research. eLife.
DOI
Millet, J., Caucheteux, C., Orhan, P., Boubenec, Y., Gramfort, A., Dunbar, W., Pallier, C., & King, J.-R. (2022). Toward a realistic model of speech processing in the brain with self-supervised learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (pp. 33428–33443). Curran Associates, Inc.
link
Schmälzle, R., Wilcox, S., & Jahn, N. T. (2022). Identifying moments of peak audience engagement from brain responses during story listening. Communication Monographs, 89(4), 515–538.
DOI
Mennen, A. C., Nastase, S. A., Yeshurun, Y., Hasson, U., Norman, K. A. (2022). Real-time neurofeedback to alter interpretations of a naturalistic narrative. NeuroImage: Reports, 2(3), 100111.
DOI
Kumar, M., Anderson, M. J., Antony, J. W., Baldassano, C., Brooks, P. P., Cai, M. B., Chen, P.-H. C., Ellis, C. T., Henselman-Petrusek, G., Huberdeau, D., Hutchinson, J. B., Li, P. Y., Lu, Q., Manning, J. R., Mennen, A. C., Nastase, S. A., Richard, H., Schapiro, A. C., Schuck, N. W., Shvartsman, M., Sundaraman, N., Suo, D., Turek, J. S., Turner, D. M., Vo, V. A., Wallace, G., Wang, Y., Williams, J. A., Zhang, H., Zhu, X., Capota, M., Cohen, J. D., Hasson, U., Li, K., Ramadge, P. J., Turk-Browne, N. B., Willke, T. L., & Norman, K. A. (2021). BrainIAK: The Brain Imaging Analysis Kit. Aperture Neuro, 1(4).
DOI
Dominey, P. F. (2021). Narrative event segmentation in the cortical reservoir. PLOS Computational Biology, 17(10), e1008993.
DOI
Caucheteux, C., Gramfort, A., & King, J. R. (2021). Model-based analysis of brain activity reveals the hierarchy of language in 305 subjects. In Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.
link
Caucheteux, C., Gramfort, A., & King, J. R. (2021). Disentangling syntax and semantics in the brain with deep networks. In M. Meila & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning: Vol. 139 (pp. 1336-1348). Proceedings of Machine Learning Research (PMLR).
link
Nastase, S. A., Liu, Y. F., Hillman, H., Norman, K. A., & Hasson, U. (2020). Leveraging shared connectivity to aggregate heterogeneous datasets into a common response space. NeuroImage, 116865.
DOI
Chien, H.-Y. S., & Honey, C. J. (2020). Constructing and forgetting temporal context in the human cerebral cortex. Neuron, 106.
DOI
Nastase, S. A., Gazzola, V., Hasson, U., & Keysers, C. (2019). Measuring shared responses across subjects using intersubject correlation. Social Cognitive and Affective Neuroscience, 14(6), 667–685.
DOI
Lin, X., Sur, I., Nastase, S. A., Divakaran, A., Hasson, U., & Amer, M. R. (2019). Data-efficient mutual information neural estimator. arXiv, arXiv:1905.03319.
DOI