Entering edit mode
9.7 years ago
qwzhang0601
▴
80
I have a group of genes and want to test whether the gene pairs from my list tend to involve simultaneously in more biological process than that of gene pairs from all protein coding genes. So I want to count the number of BP GO terms shared by each gene pairs. However, I am afraid that there will be bias to better studied genes, since the hierarchical structure of GO terms (if two genes share a child GO terms, they will share all higher-level GO terms). Is there a way to reduce such bias? Thanks
I am not sure if this is possible but maybe you can try to perform permutation using genes that have equal levels of BP terms.
E.g. Gene A and Gene B is a pair and they each have 10 and 30 BP GO terms respectively. Then you can randomly select genes that have 10 and 30 BP GO terms and see if the number of GO terms shared among them are higher or lower than that of the pair you are studying. This might give you an empirical p-value as an reference. However, this will only work if there are a lot of other genes that have equal (or more) BP GO terms as the one that you are studying or else you will not be able to generate a good enough null distribution for the data....