Hi,
I have a continuous variable A which is not parse, and I have a group of continuous variables which are very sparse (some of them have only one non-0 value). I want to calculate the correlations between variable A vs each of the variable in the group. I used cor.test() from r package stats, in which the default test is Pearson test. However the results look not very trustable. One variable that has only one non-0 value shows the most significant correlation with the variable A based on the p value.
I wondered if I'm using the wrong test on this type of data? What is a better way to calculate their corelations?
Thanks!
Not sure if this has foundation in statistics.
I suggest you try doing a singular value decomposition on both datasets, then take the first 10 components and calculate the correlations of those vectors.
Hmm, perhaps try a distance metric like mean squared deviation?
Thanks for the suggestion! Could you please give me some more details on how to do this?
Thanks so much!
You will need truncated SVD for sparse data. Have your data matrix, select the number of components (I suggest 5-10), and that is pretty much it.
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html