I am trying to find out whether a pair of genes is random or not, and for that I have opted to check how many GO terms overlap between the two (and doing some filtering, of course). Does this look like a good approach? I have run Fisher's t-test and it comes back significant, but it also does on randomly-generated pairs.
Coming back to this... I am having some trouble because, as I expected, when I obtain the overlap of GO terms in a given pair of genes and calculate some metrics such as the Jaccard index, the overlap coefficient or the odds ratio they always come out significant wether the pair was random or not. I feel like that is "obvious" because GO terms are very broad, and the fact that a pair of genes, random or not, share many GO terms doesn't necessarily mean that those two genes are biologically related... I would like to know what you guys think and if I should look for different alternatives to demonstrate wether my observed gene pairs are biologically relevant or not. Thanks in advance.
Have you tried using the percentile in the distribution of Jaccard indices as the metric of randomness, as I suggested?