How to select pairs to have a significant sample for similarity
0
1
Entering edit mode
5.1 years ago

Dear all,

I have a list of biological entities (say genes) and I would like to compute all unique pairs (e.g (A,B,C)-> (A,B), (A,C), (BC)) and then calculate their similarity based on their GO. Good up to now, but the number of entities is say 2000 so the number of unique pairs is millions (2000 choose 2) so the similarity for all will take months to compute. I have started to compute the similarity for all pairs and it took 3 weeks for 68000 pairs. As similarity I use the GAPGOM method.

Thus can you suggest me a sound technique on how to sample pairs in order to have significant result?

Thank you in advance!

similarity gene ontology sampling pairs • 1.2k views
ADD COMMENT
2
Entering edit mode

I would like to compute all unique pairs (e.g (A,B,C)-> (A,B), (A,C), (BC)) and then calculate their similarity based on their GO.a

Why are you doing this with so many pairs? I don't think the GAPGOM package was designed with this in mind.

ADD REPLY
0
Entering edit mode

Because I am working on a similarity function and I want to test how much this function correlates with the GO similarity. So I can't really make it with a little number of pairs of genes because then it would not be significant..

ADD REPLY
0
Entering edit mode

On the point of comparing some genes similarity with their GO similarity, you may be interested in this paper on which I collaborated.

ADD REPLY
0
Entering edit mode

How do you go about the computation? Typical semantic similarity measures like Resnik's across GO biological process domain should take a few hours to compute for all ~20000 protein coding human genes without parallelization.

ADD REPLY

Login before adding your answer.

Traffic: 2260 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6