Hello Biostars, I did differential expression analysis by DESeq2 whit LFC=1 and FDR=005, but I found a list of significant differentially expressed genes which is imbalanced, (1345 upregulated and 38 downregulated genes). Does it mean my analysis is not correct or this can happen? thanks
Why should the number be balanced? You may indeed have more up-regulated genes than down-regulated ones in this analysis.
I've seen this before and it's always something to worry about. However, I've also seen it when analysing a dataset where an inhibitor of a global repressor of expression has been used - and it was consistent with other similar datasets.
You've got to wonder whether there is real biology going on, or whether there is a problem with your raw data or your pipeline. So, are there large differences in the quality, or the size, or the complexity of the libraries? On an MA plot, is there any trend in logFC with respect to average expression? If you iterate dropping-out each sample and rerunning DESeq2, does the bias remain in every iteration? Is there a confounding batch-effect between the two arms?
Thanks Yes, there is a batch effect between samples, they’re not from the same run and I did the analysis between 24 test and 3 control samples, my test samples are tumor samples that are infected by a virus. I tried many ways of analysis without removing batch effect, the number of upregulated gene is 1249 and downregulated are 14, and after removing batch effects the number of upregulated gene is 1345 and downregulated genes are 38, The number of expressed genes are not the same as well !
could you post an MA plot
Hello Russh, sorry for my late reply here is the MAplot after using lfcshrink
Could you repost that and only highlight the 38 downregulated and the 1345 upregulated features. There's looks like a fairly even directional balance above mean ~ 100 normalised counts.
Hello Russhh, I posted the new MAplot