Which statistical test to show the overlap of some genes in two different data is significant.
1
0
Entering edit mode
5.0 years ago
archisman • 0

Suppose I have a gene list of 470 genes that are induced in my study. I found that in other studies people already showed about 1000 genes were involved in the same kind of pathways but with the different model systems. Now, when I did an overlap of those 1000 genes with my 470 genes I found out of 470, at least 165 are common. Which statistical test I need to perform here to show that the overlap is not due to only by chance.

RNA-Seq R • 988 views
ADD COMMENT
0
Entering edit mode

I suggest LOLA (BioC package) for this task: https://bioconductor.org/packages/release/bioc/html/LOLA.html

ADD REPLY
0
Entering edit mode

Hypergeometric test? See: Hypergeometric {stats}

ADD REPLY
1
Entering edit mode
5.0 years ago
russhh 5.7k

A slight issue here is that you need to know what the gene-universe is before you can do these comparisons, and you need to restrict your gene counts to just those that could have been assessed in both studies: for example, although you have 470 induced genes, they might not all have been studied in the other study.

Once you've got the gene-universe and the restriction of your gene-counts to that gene-universe, the standard approach would be to use Fisher's Exact test.

ADD COMMENT
0
Entering edit mode

(although it's not perfect: the false-positive rate isn't equal across the range of expression levels for RNA-Seq datasets, and a number of gene-pairs are 'technically correlated' because of sequence similarity etc; to mitigate against the first of these, I'd recommend you rerun it with several choices of RNA-Seq significance threshold, and with several choices of detectability cutoffs)

ADD REPLY

Login before adding your answer.

Traffic: 1582 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6