Standard tools for differential expression analysis tools (e.g. edgeR and DESeq2) assume that most genes in the samples are equally expressed, and only a small fraction of genes are differentially expressed. I was wondering how we can compare two very different RNA samples. For example, one from muscle and the other from liver. I know some people just use a more stringent criterion (e.g. 4-fold difference and FDR <0.001). Is there a more statistically sound way to do the analysis? Thanks!
Are you sure about that?
e.g. For example: "Still, it is important to keep in mind that even these methods are based on an assumption that most genes are equivalently expressed in the samples." from https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-91
Thanks for the citation, I appreciate it! I won't actually believe it until I see it stated by the writers of the tools, but it doesn't seem unlikely.
That said - don't consider my opinion here to be authoritative, but people use those tools all the time for differential expression analysis between tissue types. RNA-seq always seems to be unpredictable and hard to reproduce, though, so I'm not really sure how you would validate that an approach is working correctly.
It's indeed an assumption of DESeq2 and similar tools. Now I'm trying to find a reference for that too...
Please correct me if I am wrong. I guess it is difficult to get "normalized counts" for very different samples. Indeed, most people just go head use these tools with different organs or tissues, but I just wonder if there is a better way. :-) Another way to think about this, maybe it is meaningless to analyze differentially expressed genes between tissues since they are already too different.
I guess you could normalize to a a priori selected set of housekeeping genes as "stable background".
I know people use a selected set of housekeeping genes as controls for qPCR analysis. Could we do that in RNA-Seq analysis? Could you provide a reference for that? Thanks!