Significance between two sets of DEGs from same species
2
0
Entering edit mode
6.7 years ago
sakuraazalea ▴ 20

I am analyzing honeybee RNA-seq data from two different studies.

Study 1 had 15,314 genes total with 118 DEGs. Study 2 had 11,825 genes total with 740 DEGs. There was an overlap of 67 between the two sets of DEGs.

I want to test whether this overlap is significant. I see one approach is to use Fisher Exact Test (https://rdrr.io/bioc/GeneOverlap/man/GeneOverlap.html). I am pretty sure I need to set up a 2*2 table but am unclear on the values. I am especially unclear on the first value Q below. I believe Q should be equal to N-(740+118-67), but am unsure of what value N should be used as there are two different total gene numbers (15,314 and 11,825).

fisher.test(matrix(c(Q, 740-67, 118-67, 67), nrow=2), alternative="greater")

What values should I used in this case? Thank you for sharing advice.

fisher.exact RNA-Seq • 1.6k views
ADD COMMENT
0
Entering edit mode

The link you provided doesn't work. When doing Fisher's Exact Test we typically set up the values using a contingency table (2*2). I would suggest making sure understand that first, then looking at Fisher's Exact Test.

ADD REPLY
0
Entering edit mode
6.7 years ago

You should first clean up each dataset by removing every gene not present in both studies. This can change the number of DEG identified in each dataset. Then, N= the number of genes tested in both studies.

ADD COMMENT
0
Entering edit mode
6.7 years ago

You should use the total number of genes used in the annotation you used for the gene analysis. Did you redo the analysis workflow for both studies using the same analysis workflow and same annotation ? or did you just take the results from publications ? For the first solution you should then use the total number of genes in your annotation and perform a fisher test as you described in your question.

fisher.test(matrix(c(Q, 740-67, 118-67, 67), nrow=2), alternative="greater")

For the second solution, maybe you could use the union of the 15,314 and 11,825 gene list. Or better reperform the analysis to control that the datasets were analyzed in the same manner to avoid analysis bias.

ADD COMMENT

Login before adding your answer.

Traffic: 2344 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6