Hey everyone!
I am working on a project exploring the role of one exact transcriptional factor on the acquired chemoresistance. I have Illumina transcriptome data, which has been proceeded with DESeq2 to find differentially expressed genes, and the resulting table containing only genes with their gene symbols being present in GTF (about 20000 genes).
To support my hypothesis I found another dataset with expression data sequenced on Illumina BeadChip (arrays) and proceeded it with limma package. In the resulting table, I have expression data for about 40000 genes, only 20000 of them being in GTF annotation. In this dataset there are 2 factors: knockdown of the transcriptional factor of my interest and chemotherapeutic drug treatment.
I would like to find out whether this TF is explaining acquired chemoresistance by doing 2 FETs: one is intersecting DEGs in my data with DEGs from this dataset upon treatment and functioning TF, another is intersecting DEGs in my data with DEGs from this dataset upon treatment and knocked down TF. I expect first intersection to be significant, second not to be.
I found out that if I select DEGs from the second dataset containing only genes that are present in GTF, I get the expected results. However, selecting all genes from the second dataset leads to not significant results in boths tests. Should I filter DEGs with GTF then?
Thank you in advance!
There are 2 simple approaches which you could try:
There are other methods such as the R package
RobustRankAggreg
which you can use to compare lists of genes which come from different backgrounds and technologies.Thank you for your advice! I will definitely try it!