Question

Fisher exact test on different gene sets

0

Entering edit mode

18 months ago

sleepystudent • 0

Hey everyone!

I am working on a project exploring the role of one exact transcriptional factor on the acquired chemoresistance. I have Illumina transcriptome data, which has been proceeded with DESeq2 to find differentially expressed genes, and the resulting table containing only genes with their gene symbols being present in GTF (about 20000 genes).

To support my hypothesis I found another dataset with expression data sequenced on Illumina BeadChip (arrays) and proceeded it with limma package. In the resulting table, I have expression data for about 40000 genes, only 20000 of them being in GTF annotation. In this dataset there are 2 factors: knockdown of the transcriptional factor of my interest and chemotherapeutic drug treatment.

I would like to find out whether this TF is explaining acquired chemoresistance by doing 2 FETs: one is intersecting DEGs in my data with DEGs from this dataset upon treatment and functioning TF, another is intersecting DEGs in my data with DEGs from this dataset upon treatment and knocked down TF. I expect first intersection to be significant, second not to be.

I found out that if I select DEGs from the second dataset containing only genes that are present in GTF, I get the expected results. However, selecting all genes from the second dataset leads to not significant results in boths tests. Should I filter DEGs with GTF then?

Thank you in advance!

fisher-test • 692 views

ADD COMMENT • link updated 18 months ago by Ram 45k • written 18 months ago by sleepystudent • 0

0

Entering edit mode

There are 2 simple approaches which you could try:

For the 2 independent datasets, filter the matrices for the common genes measured between the arrays and RNA-seq. Then do differential for both (either with DESeq2 or limma) and then get the DEGs per dataset. Because you have the same background of genes in both analyses, you can do the Fisher test.
You could do the differential analyses having different backgrounds of genes for the arrays and the RNA-seq. Then, you can do permutation test: simulate sampling X DEGs from the first dataset and X DEGs from the second dataset and measure the intersect (do this many times). With this you have an empirical distribution of "expected intersect" to which you can compare the real intersect between the DEGs.

There are other methods such as the R package RobustRankAggreg which you can use to compare lists of genes which come from different backgrounds and technologies.