Hello everybody,
I downloaded from SRA all available transcriptome experiments from a nonmodel organism in order to perform comprehensive analyses. I want to analyze both sequences and differential expression. While the former seems feasible with different experiments, the latter concerns me a bit.
Is it methodologically safe to compare differential expression levels across different experiments? All of them are performed on Illumina machines, but with different technology levels (2000 / 2500 / 4000) and with different coverage.What steps do I have to perform in order to compare their transcription level? Can I pool them and do some sort of normalization?
If you'd have some reading to suggest me (papers that do an analogous analysis) it would be excellent as well.
Q: Is it methodologically safe to compare differential expression levels across different experiments?
A: I did this exercise myself. In summary, I think it is possible, but it is usually non-trivial. The answer depends a lot on the kind of differential expression analysis that you are want to do. The level of differential expression of a gene usually depends on both the technical and experimental covariates. If the experimental covariates are actively driving the differential expression of a gene and the DE has huge effect size, it generally shows in both studies seamlessly, and sometimes the expression profiles from different studies cluster very naturally after a z-score. But often the case is that the expression profiles cluster by studies instead. And when that happens, there are Combat or some other batch correction tools that you can try using. If yeah, there are caveats associated with batch correction too......