Question

What is the best way to know if a pair of genes is random?

1

Entering edit mode

12 days ago

Ana ▴ 10

I am trying to find out whether a pair of genes is random or not, and for that I have opted to check how many GO terms overlap between the two (and doing some filtering, of course). Does this look like a good approach? I have run Fisher's t-test and it comes back significant, but it also does on randomly-generated pairs.

go-terms genes • 508 views

ADD COMMENT • link updated 6 hours ago by shelkmike ★ 1.5k • written 12 days ago by Ana ▴ 10

0

Entering edit mode

Coming back to this... I am having some trouble because, as I expected, when I obtain the overlap of GO terms in a given pair of genes and calculate some metrics such as the Jaccard index, the overlap coefficient or the odds ratio they always come out significant wether the pair was random or not. I feel like that is "obvious" because GO terms are very broad, and the fact that a pair of genes, random or not, share many GO terms doesn't necessarily mean that those two genes are biologically related... I would like to know what you guys think and if I should look for different alternatives to demonstrate wether my observed gene pairs are biologically relevant or not. Thanks in advance.

ADD REPLY • link 8 hours ago by Ana ▴ 10

0

Entering edit mode

Have you tried using the percentile in the distribution of Jaccard indices as the metric of randomness, as I suggested?

ADD REPLY • link 6 hours ago by shelkmike ★ 1.5k

score 0 · Answer 1 · 2025-04-14

I don't know about the best way, but one method to do this, I think, could look as follows:
1) Take all possible pairs of genes and for each pair calculate a similarity metric, for example a Jaccard index, based on the number of common GO terms. Since GO terms are hierarchical, you also need to consider parent GO terms.
2) Calculate the Jaccard index for a pair of genes you are interested in.
3) See in what percentile of the distribution "1)" the Jaccard index of "2)" falls. For example, 98.8% means that this pair of genes is more similar than 98.8% of random pairs. This percentile is a metric of non-randomness.