Open reviews

November 25, 2022 — Open review of preprint by Dima and colleagues:

Dima, D. C., Hebart, M. N., & Isik, L. (2022). A data-driven investigation of human action representations. bioRxiv. DOI


Dima and colleagues apply non-negative matrix factorization (NMF) to representational dissimilarity matrices (RDMs) constructed from behavioral similarity judgements (multiple arrangements task) of diverse naturalistic action videos. They find that a reduced ~10-dimensional space captures the similarity judgments (based on out-of-sample reconstruction quality) and provide meaningful interpretations of many of these dimensions. I think this is a nice paper: the writing is concise and the findings are convincing. Most of my comments are relatively minor: I want to clarify some analytic choices (particularly regarding the cross-validation procedure) and make sure readers have enough information to fully understand the data and modeling.

Major comments:

I had a hard time understanding the full cross-validation scheme and why certain choices were made. First of all, how were the held-out test sets (9.52% and 9.43% at line 349) selected? I assume they were random subsets? My understanding is that in Experiment 2, the final test set is a single similarity matrix derived from a random sample of five subjects. I’m a little more confused about Experiment 1; the final test set was a random subset of similarity ratings sampled across subjects, but spanning almost all pairs; is this correct? At line 355, does “imputed” mean replaced by the mean similarity across other pairs in that given RDM, or the mean similarity for that pair across the other RDMs containing that pair? What was the motivation for using a single 10% hold-out test sample rather than using a full 10-fold outer cross-validation loop?

Second, you say “the two sparsity parameters for W and H were selected using two-fold cross-validation on two thirds of the training data” and “the best combination of sparsity parameters was applied to the remaining one third of the training data.” My understanding here is that you picked the best combination of two sparsity parameters from split-half cross-validation on the two-thirds training subset; then, you take that best combination and pair it with dimensionality k = 1–150 or 1–65 in predicting the third subset of the training set. Is that correct? Or are you combining each pair of sparsity values with each dimensionality somehow? It wasn’t clear to me whether during this procedure you use a single left-out third as a hold-out set, or do you perform full three-fold cross-validation.

Third, you then say “five iterations of the above cross-validation for parameter selection were performed.” What’s the motivation for repeating the cross-validation procedure five times? Is the goal to account for abnormal random initializations for a given cross-validation fold? Or do you randomly shuffle the data before splitting? Any additional clarifying details throughout this section of the Methods would be much appreciated, given that the sparsity and dimensionality values found via this cross-validation procedure are a core part of the results.

For Experiment 1, the correlation between the low-dimensional NMF-reconstructed RDM and the held-out test RDM is 0.19. Should we consider this high? I also don’t understand your notation of reporting “(τ_A = 0.19, true τ_A = 0.14)” for the test data. My understanding is that 0.19 is the correlation between the reconstructed RDM and the actual test RDM; but I don’t understand what the “true τ_A = 0.14” refers to.

If I understand correctly, your cross-validation procedure yields an optimal sparsity parameter of 0.1 (line 97) from a scale of 0 to 0.8 sparsity. Doesn’t this tell us that the optimal solution isn’t actually very sparse? Is this factored into the current interpretation?

In describing how model performance was assessed in the Methods (e.g. the paragraph at line 371), I would reiterate that the performance metric is Kendall’s tau between the NMF-reconstructed RDM and the actual test RDM (right?).

In the NMF procedure, I think I understand that the sparsity will tend to induce categorical features and the nonnegativity will tend to yield more interpretable (positive) features—although I don’t fully understand how the math bears that out. It might be worth stating this a bit more explicitly (especially early in the manuscript) for psychologists who may not be familiar with NMF.

I think it would be worthwhile to include an additional sentence describing the multiple arrangements task used to construct the behavioral similarity judgments early on in the paper (e.g. in the paragraphs at lines 55 or 84). In general, readers should be able to understand the data without having to refer to Dima et al., 2022 (especially because the Dima et al., 2022 paper is missing from the reference list!—at least in the journal’s version I’m reading). I also think it would be worth including the actual task instructions used verbatim in the Methods section. (We’ve found in our work that—unsurprisingly—participants will produce different arrangements based on different task instructions.)

Minor comments:

Line 319: “multi-dimensiomal” > “multi-dimensional”

Wasn’t the multiple arrangements task used here first laid out in detail by Goldstone, 1994? Might be worth citing this if you think it’s relevant.

References:

Dima, D. C., Tomita, T. M., Honey, C. J., & Isik, L. (2022). Social-affective features drive human representations of observed actions. eLife, 11, e75027. DOI

Goldstone, R. (1994). An efficient method for obtaining similarity data. Behavior Research Methods, Instruments, & Computers, 26(4), 381–386. DOI