Hello all,
I have some RNA-seq data from an unusual biological system, and as a result I need to look for genes that are differentially expressed genes between two samples, one of which was super stressed. My DESeq2 results said that more than 70% of genes had an adjusted p-value below 0.1! I was expecting a lot of differentially expressed genes, but this seems very high. Is there such a thing as 'too many' DEGs? Is there a way I can test whether my data fall into this category? Does anyone know of any solutions to this issue? Thanks in advance!
Note that one of the first steps of the DESeq2 process is library normalization, where DESeq normalizes based on assuming that the genes whose expression is the median is expressed the same between your samples. If this fundamental assumption is wrong, it's going to affect the results you get. You might need to think about some other kind of normalization.
Thanks for this response! I liked what Tiago mentioned about cutoffs so I made a volcano plot. In this plot, the blue is everything with padj < 1e-10, green is everything with log2 fold change > 5, and yellow is both. Would be really interested to hear anyone's thoughts on this. Not sure if it's ideal to just choose cutoffs that 'look good'. I'll look into other normalization methods like you suggest, though admittedly I'm not sure where to start...
Here is a little tip for your volcano plot: Make a column with information about who is DEG, and who is not