Open data

The “Narratives” fMRI dataset for evaluating models of naturalistic language comprehension

Alt text
_{^{Ethel Franklin Betts (1908), from The Orphant Annie Book, by James Whitcomb Riley (wikimedia)}}

OpenNeuro ds002345 `OpenNeuro` `DataLad` `DOI`

The “Narratives” collection aggregates fMRI datasets acquired over the course of seven years (2011–2018) while participants listened to spoken stories. In aggregate, participants listened to 27 diverse stories ranging from ~3 to ~56 minutes for a total of ~4.6 hours of unique audio stimuli. The collection currently includes 345 unique subjects participating in a total of 891 functional scans with accompanying anatomical data. Data are organized into a machine-readable format according to the BIDS standard with exhaustive metadata derived from the original DICOMs. Anonymized subject labels are linked across sessions and include demographic and behavioral variables including age, gender, condition, and comprehension score. Auditory stimuli are included in the dataset for non-commercial scholarly research—principally feature extraction—under fair use or fair dealing provisions. The data collection amounts to over 350,000 functional volumes of story-listening fMRI data and accompanying stimuli, totaling 6.4 days. The scripts used to collate and process these data are available at the GitHub repository. Slides for a presentation of this dataset at SfN 2019 are available on Google Slides. The public data release is accompanied by a data descriptor paper currently in preparation. If you find this dataset useful, please cite the following:

Nastase, S. A., Liu, Y.-F., Hillman, H., Zadbood, A., Hasenfratz, L., Keshavarzian, N., Chen, J., Honey, C. J., Yeshurun, Y., Regev, M., Nguyen, M., Chang, C. H. C., Baldassano, C., Lositsky, O., Simony, E., Chow, M. A., Leong, Y. C., Brooks, P. P., Micciche, E., Choe, G., Goldstein, A., Vanderwal, T., Halchenko, Y. O., Norman, K. A., & Hasson, U. (2021). The “Narratives” fMRI dataset for evaluating models of naturalistic language comprehension. Scientific Data, 8, 250. DOI PDF

This dataset has been (re-)analyzed in the following publications:

Zhang, Y., Yin, C., Xia, R., & Li, P. (2025). Brain encoding oriented text semantic disentangling and analysis. In 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 3288-3294). IEEE.
Lv, C., Li, X., & Wang, W. (2025). An end-to-end framework for reconstructing continuous language from fMRI data. In 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 3904-3907). IEEE. DOI
Yang, X., Gao, C., Xiao, C., Riccardi, N., & Desai, R. H. (2026). Multifaceted neural representation of words in naturalistic language. arXiv. DOI
Yin, C., Yu, Q., Fang, Z., Peng, C., & Li, P. (2025). Rethinking cross-subject data splitting for brain-to-text decoding. In C. Christodoulopoulos, T. Chakraborty, C. Rose, & V. Peng (Eds.), Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (pp. 5675–5689). Association for Computational Linguistics. DOI
Remedios, S. W., Carass, A., Prince, J. L., & Dewey, B. E. (2025). Diffusion-driven generation of minimally preprocessed brain MRI. arXiv. DOI
Moussa, O., & Toneva, M. (2025). Brain-tuning improves generalizability and efficiency of brain alignment in speech models. arXiv. DOI
Binhuraib, T., Gao, R., & Ivanova, A. A. (2025). LITcoder: a general-purpose library for building and comparing encoding models. arXiv. DOI
Samara, A., Zada, Z., Vanderwal, T., Hasson, U., & Nastase, S. A. (2025). Cortical language areas are coupled via a soft hierarchy of model-based linguistic features. bioRxiv. DOI
Rahimi, M., Yaghoobzadeh, Y., & Daliri, M. R. (2025). Explanations of large language models explain language representations in the brain. arXiv. DOI
Yin, C., Zhang, Y., Wen, X., & Li, P. (2025). Improve language model and brain alignment via associative memory. arXiv. DOI
Yang, L., Guo, L., Yuan, Y., Han, J., Hu, X., & Zhang, T. (2025). A foundational fMRI model for representing continuous brain states. IEEE Journal of Biomedical and Health Informatics. DOI
Tu, Z., Dai, L., Zhang, B., Chen, S., Yang, Y., Meng, D., Gong, Y., & Sun, J. (2025). Revealing human brain syntactic processing: insights from voxel-wise models and network representation. Brain and Language, 265, 105569. DOI
Janssen, J., Guil Gallego, A., Díaz-Caneja, C. M., Gonzalez Lois, N., Janssen, N., González-Peñas, J., Gordaliza, P. M., Buimer, E., van Haren, N., Arange, C., Kahn, R., Hulshoff Pol, H. E., & Schnack, H. G. (2025). Heterogeneity of morphometric similarity networks in health and schizophrenia. Schizophrenia, 11, 70. DOI
Fialoke, S., Deb, A., Rode, K., Tripathi, V., & Garg, R. (2025). Temporal synchronization analysis: a model-free method for detecting robust and nonlinear brain activation in fMRI data. bioRxiv. DOI
Chen, Y., Zada, Z., Nastase, S. A., Ashby, F. G., & Ghosh, S. S. (2025). Context modulates brain state dynamics and behavioral responses during narrative comprehension. bioRxiv. DOI
AlKhamissi, B., Tuckute, G., Tang, Y., Binhuraib, T., Bosselut, A., & Schrimpf, M. (2025). From language to cognition: how LLMs outgrow the human language network. arXiv. DOI
Zada, Z., Nastase, S. A., Speer, S., Mwilambwe-Tshilobo, L., Tsoi, L., Burns, S., Falk, E., Hasson, U., & Tamir, D. (2025). Linguistic coupling between neural systems for speech production and comprehension during real-time dyadic conversations. bioRxiv. DOI
Tikochinski, R., Goldstein, A., Meiri, Y., Hasson, U., & Reichart, R. (2024). Incremental accumulation of linguistic context in artificial and biological neural networks. Nature Communications, 16, 803. DOI
Ye, Z., Ai, Q., Liu, Y., de Rijke, M., Zhang, M., Lioma, C., & Ruotsalo, T. (2025). Generative language reconstruction from brain recordings. Communications Biology, 8, 346. DOI
Linli, Z., Liang, X., Zhang, Z., Hu, K., & Guo, S. (2025). Enhancing brain age estimation under uncertainty: a spectral-normalized neural gaussian process approach utilizing 2.5D slicing. NeuroImage, 311, 121184. DOI
Botch, T. L., & Finn, E. S. (2024). Neural representations of concreteness and concrete concepts are unique to the individual. Journal of Neuroscience, e0288242024. DOI
Kang, K., Seidlitz, J., Bethlehem, R. A., Xiong, J., Jones, M. T., Mehta, K., Keller, A. S., Tao, R., Randolph, A., Larsen, B., Tervo-Clemmens, B., Feczko, E., Dominguez, O. M., Nelson, S. M., Lifespan Brain Chart Consortium, Schildcrout, J., Fair, D. A., Satterthwaite, T. D., Alexander-Bloch, A., & Vandekar, S. (2024). Study design features increase replicability in brain-wide association studies. Nature, 636(8043), 719-727. DOI
Raccah, O., Chen, P., Gureckis, T. M., Poeppel, D., & Vo, V. A. (2024). The “Naturalistic Free Recall” dataset: four stories, hundreds of participants, and high-fidelity transcriptions. Scientific Data, 11, 1317. DOI
Usman, M., Rehman, A., Shahid, A., Rehman, A. U., Gho, S. M., Lee, A., Khan, T. M., & Razzak, I. (2024). Multi-task adversarial variational autoencoder for estimating biological brain age with multimodal neuroimaging. arXiv. DOI
Rehman, A. U., Rehman, A., Usman, M., Shahid, A., Gho, S. M., Lee, A., Khan, T. M., & Razzak, I. (2024). Biological brain age estimation using sex-aware adversarial variational autoencoder with multimodal neuroimages. arXiv. DOI
Dominey, P. F. (2024). A connectivity gradient in structured reservoir computing predicts a hierarchy for mixed selectivity in human cortex. In 2024 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE. DOI
Sun, H., Zhao, L., Wu, Z., Gao, X., Hu, Y., Zuo, M., Zhang, W., Han, J., Liu, T., & Hu, X. (2024). Brain-like functional organization within large language models. arXiv. DOI
Bao, R., He, S., Grant, E., & Ou, Y. (2024). AGE2HIE: transfer learning from brain age to predicting neurocognitive outcome for infant brain injury. arXiv. DOI
Yazin, F., Majumdar, G., Bramley, N., & Hoffman, P. (2024). Fragmentation and multithreading of experience in the default-mode network. bioRxiv. DOI
Chang, C. H., Nastase, S. A., Zadbood, A., & Hasson, U. (2024). How a speaker herds the audience: multi-brain neural convergence over time during naturalistic storytelling. Social Cognitive and Affective Neuroscience. DOI
Yang, X., O’Reilly, C., & Shinkareva, S. V. (2024). Embracing naturalistic paradigms: substituting GPT predictions for human judgments. bioRxiv. DOI
Kumar, S.*, Sumers, T. R.*, Yamakoshi, T., Goldstein, A., Hasson, U., Norman, K. A., Griffiths, T. L., Hawkins, R. D., & Nastase, S. A. (2024). Shared functional specialization in transformer-based language models and the human brain. Nature Communications, 15, 5523. DOI
Li, H., Mei, K., Liu, Z., Ai, Y., Chen, L., Zhang, J., & Ling, Z. (2024). Refining self-supervised learnt speech representation using brain activations. arXiv. DOI
Kobo, O., Yeshurun, Y., & Schonberg, T. (2024). Reward-related regions play a role in natural story comprehension. iScience, 27(6), 109844. DOI
Yin, C., Ye, Z., & Li, P. (2024). Language reconstruction with brain predictive coding from fMRI data. arXiv. DOI
Ye, Z., Zhan, J., Ai, Q., Liu, Y., de Rijke, M., Lioma, C., & Ruotsalo, T. (2024). Query augmentation by decoding semantics from brain signals. arXiv. DOI
Li, J. (2024). On the shape of brainscores for large language models (LLMs). arXiv. DOI
He, S., Guan, Y., Cheng, C. H., Moore, T. L., Luebke, J. I., Killiany, R. J., Rosene, D. L., Koo, B.-B., & Ou, Y. (2023). Human-to-monkey transfer learning identifies the frontal white matter as a key determinant for predicting monkey brain age. Frontiers in Aging Neuroscience, 15, 1249415. DOI
He, Z., & Toyoizumi, T. (2023). Causal graph in language model rediscovers cortical hierarchy in human narrative processing. arXiv. DOI
Oota, S. R., Agarwal, V., Marreddy, M., Gupta, M., & Bapi, R. S. (2023). Speech taskonomy: which speech tasks are the most predictive of fMRI brain activity? In Interspeech 2023 (pp. 5167–5171). DOI
Yin, C., Yu, Q., Fang, Z., He, J., Peng, C., Lin, Z., Shao, J., & Li, P. (2023). Data contamination issues in brain-to-text decoding. arXiv. DOI
Schmälzle, R., Liu, H., Delle, F. A., Lewin, K. M., Jahn, N. T., Zhang, Y., Yoon, H., & Long, J. (2023). Moment-by-moment tracking of audience brain responses to an engaging public speech: replicating the reverse-message engineering approach. Communication Monographs, 91(1), 31–55. DOI
Song, H., Shim, W. M., & Rosenberg, M. D. (2023). Large-scale neural dynamics in a shared low-dimensional state space reflect cognitive and attentional dynamics. eLife, 12, e85487. DOI
Hahamy, A., Dubossarsky, H., & Behrens, T. E. J. (2023). The human brain reactivates context-specific past information at event boundaries of naturalistic experiences. Nature Neuroscience. DOI
Xi, N., Zhao, S., Wang, H., Liu, C., Qin, B., & Liu, T. (2023). UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 13277-13291). link
Bethlehem, R. A., Seidlitz, J., White, S. R., Vogel, J. W., Anderson, K. M., Adamson, C., … & Schaare, H. L. (2022). Brain charts for the human lifespan. Nature, 604(7906), 525–533. DOI
Liu, X., Zhou, M., Shi, G., Du, Y., Zhao, L., Wu, Z., Liu, D., Liu, T., & Hu, X. (2023). Coupling artificial neurons in bert and biological neurons in the human brain. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 7, pp. 8888–8896). DOI
Caucheteux, C., Gramfort, A., & King, J. R. (2023). Evidence of a predictive coding hierarchy in the human brain listening to speech. Nature Human Behaviour, 7, 430–441. DOI
Oota, S. R., Marreddy, M., Gupta, M., & Bapi, R. S. (2023). How does the brain process syntactic structure while listening? In Findings of the Association for Computational Linguistics: ACL 2023 (pp.6624–6647). Association for Computational Linguistics. DOI
Chang, H. C. C., Nastase, S. A., & Hasson, U. (2022). Information flow across the cortical timescales hierarchy during narrative construction. Proceedings of the National Academy of Sciences, 119(51), e2209307119. DOI
Oota, S. R., Gupta, M., & Toneva, M. (2023). Joint processing of linguistic properties in brains and language models. In Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., & Levine, S. (Eds.), Advances in Neural Information Processing Systems 36 (pp. 18001–18014). link
Dufumier, B., Grigis, A., Victor, J., Ambroise, C., Frouin, V., & Duchesnay, E. (2022). OpenBHB: a large-scale multi-site brain MRI data-set for age prediction and debiasing. NeuroImage, 263, 119637. DOI
Caucheteux, C., Gramfort, A., & King, J. R. (2022). Deep language algorithms predict semantic comprehension from brain activity. Scientific Reports, 12, 16327. DOI
Thomas, A. W., Ré, C., & Poldrack, R. A. (2022). Self-supervised learning of brain dynamics from broad neuroimaging data. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (pp. 21255–21269). Curran Associates, Inc. link
de la Vega, A., Rocca, R., Blair, R. W., Markiewicz, C. J., Mentch, J., Kent, J. D., Herholz, P., Ghosh, S. S., Poldrack, R. A., & Yarkoni, T. (2022). Neuroscout, a unified platform for generalizable and reproducible fMRI research. eLife. DOI
Millet, J., Caucheteux, C., Orhan, P., Boubenec, Y., Gramfort, A., Dunbar, W., Pallier, C., & King, J.-R. (2022). Toward a realistic model of speech processing in the brain with self-supervised learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (pp. 33428–33443). Curran Associates, Inc. link
Schmälzle, R., Wilcox, S., & Jahn, N. T. (2022). Identifying moments of peak audience engagement from brain responses during story listening. Communication Monographs, 89(4), 515–538. DOI
Mennen, A. C., Nastase, S. A., Yeshurun, Y., Hasson, U., Norman, K. A. (2022). Real-time neurofeedback to alter interpretations of a naturalistic narrative. NeuroImage: Reports, 2(3), 100111. DOI
Kumar, M., Anderson, M. J., Antony, J. W., Baldassano, C., Brooks, P. P., Cai, M. B., Chen, P.-H. C., Ellis, C. T., Henselman-Petrusek, G., Huberdeau, D., Hutchinson, J. B., Li, P. Y., Lu, Q., Manning, J. R., Mennen, A. C., Nastase, S. A., Richard, H., Schapiro, A. C., Schuck, N. W., Shvartsman, M., Sundaraman, N., Suo, D., Turek, J. S., Turner, D. M., Vo, V. A., Wallace, G., Wang, Y., Williams, J. A., Zhang, H., Zhu, X., Capota, M., Cohen, J. D., Hasson, U., Li, K., Ramadge, P. J., Turk-Browne, N. B., Willke, T. L., & Norman, K. A. (2021). BrainIAK: The Brain Imaging Analysis Kit. Aperture Neuro, 1(4). DOI
Dominey, P. F. (2021). Narrative event segmentation in the cortical reservoir. PLOS Computational Biology, 17(10), e1008993. DOI
Caucheteux, C., Gramfort, A., & King, J. R. (2021). Model-based analysis of brain activity reveals the hierarchy of language in 305 subjects. In Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic. link
Caucheteux, C., Gramfort, A., & King, J. R. (2021). Disentangling syntax and semantics in the brain with deep networks. In M. Meila & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning: Vol. 139 (pp. 1336-1348). Proceedings of Machine Learning Research (PMLR). link
Nastase, S. A., Liu, Y. F., Hillman, H., Norman, K. A., & Hasson, U. (2020). Leveraging shared connectivity to aggregate heterogeneous datasets into a common response space. NeuroImage, 116865. DOI
Chien, H.-Y. S., & Honey, C. J. (2020). Constructing and forgetting temporal context in the human cerebral cortex. Neuron, 106. DOI
Nastase, S. A., Gazzola, V., Hasson, U., & Keysers, C. (2019). Measuring shared responses across subjects using intersubject correlation. Social Cognitive and Affective Neuroscience, 14(6), 667–685. DOI
Lin, X., Sur, I., Nastase, S. A., Divakaran, A., Hasson, U., & Amer, M. R. (2019). Data-efficient mutual information neural estimator. arXiv, arXiv:1905.03319. DOI

Sam Nastase

Open data

The “Narratives” fMRI dataset for evaluating models of naturalistic language comprehension

OpenNeuro ds002345 OpenNeuro DataLad DOI

This dataset has been (re-)analyzed in the following publications:

OpenNeuro ds002345 `OpenNeuro` `DataLad` `DOI`