Question

Sampling distribution of cosine similarity

0

Entering edit mode

11 months ago

jayeshkumarsundaram ▴ 10

I am dealing with non-negative dataset. Trying to test the significance of cosine similarity between variables. So I randomized the data and created null distribution of cosine similarity. For some variable pairs, the null distribution looks like a normal distribution. So it is well and good, I can fit a normal distribution to get a p value for the observed cosine similarity value. But for some pairs, the null distribution is close to 0 or 1, and extremely skewed. And I cannot fit normal distribution to it. Looks like I have to do something like Fischer-Z transformation (generally used for person’s r) here.

Option 1: I can re-scale and shift my cosine similarity values to go from range [0,1]. And use Fischer-Z transformation to test the significance.

Option 2: Use some distribution like beta distribution (bounded on both ends and uses data points from 0 to 1) to fit the null distribution of cosine similarity values.

Suggestions please .. thanks.

RNA-seq • 577 views

ADD COMMENT • link updated 11 months ago by james.hawley ▴ 80 • written 11 months ago by jayeshkumarsundaram ▴ 10

score 0 · Answer 1 · 2024-08-26

This may be a relevant paper for you: https://arxiv.org/abs/2310.13994. It discusses the moments of the cosine similarity for two IID variables with finite means and covariances. This should give you some theoretical basis for a null distribution, if you know or assume what the distribution of your observed data points are.