Hi everyone, I have a general question concerning RNAseq analysis.
I'm running the analysis, comparing 17 vs 39 biological replicas and the goal is to identify biomarkers for these two group. In order to identify DE genes I'm using DESeq2. I filter the resulting list by Log2fold change >1,<-1 (genes will be used for qrtPCR, so the DE level have to be detectable by that method), by AUC >0.7 (to identify genes which separate two classes in the best way) and by TPM (to drop low expression genes).
After these filters I still have a list of 54 genes. Now I'd like to reduce that number in order to test them on a large sample set in qrtPCR experiment and build a proper classifier.
What I did so far was a "leave one out" DE analysis. Basically I run 68 (17+39) DE searches leaving out one sample each time. That results in the 68 lists of DE genes and 39 genes were in all of them (after described filtering). It seems to me that these 39 genes should the most robust DE genes. Is it true, or are there any internal problem with such approach? For example I know that there is a "minReplicatesForReplace" option in DESeq2 which seems to perform similar thing?
And the second question is: could further apply feature slection methods to that set of 39 genes? For example RFE-SVM?
Best, Eugene
Thanks! I'll keep it in mind next time. Although I'd say that 39 is a not that small number when have to be validated in the lab -> my task was to reduce the list of genes as far as possible.
Inevitable a quarter of them will be uncharacterized, so you'll probably ignore them. Others will be more interesting given whatever you're working on, so the final list after reading through the literature will probably be closer to 12, which is pretty doable with qPCR.