Sampling distribution of cosine similarity
1
0
Entering edit mode
4 months ago

I am dealing with non-negative dataset. Trying to test the significance of cosine similarity between variables. So I randomized the data and created null distribution of cosine similarity. For some variable pairs, the null distribution looks like a normal distribution. So it is well and good, I can fit a normal distribution to get a p value for the observed cosine similarity value. But for some pairs, the null distribution is close to 0 or 1, and extremely skewed. And I cannot fit normal distribution to it. Looks like I have to do something like Fischer-Z transformation (generally used for person’s r) here.

Option 1: I can re-scale and shift my cosine similarity values to go from range [0,1]. And use Fischer-Z transformation to test the significance.

Option 2: Use some distribution like beta distribution (bounded on both ends and uses data points from 0 to 1) to fit the null distribution of cosine similarity values.

Suggestions please .. thanks.

RNA-seq • 324 views
ADD COMMENT
0
Entering edit mode
4 months ago
james.hawley ▴ 80

This may be a relevant paper for you: https://arxiv.org/abs/2310.13994. It discusses the moments of the cosine similarity for two IID variables with finite means and covariances. This should give you some theoretical basis for a null distribution, if you know or assume what the distribution of your observed data points are.

ADD COMMENT

Login before adding your answer.

Traffic: 1769 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6