Entering edit mode
4.0 years ago
garcesj
▴
50
Hi there,
I'm merging multiple RNAseq datasets and, after differential expression analysis with DESeq2
, I got a very weird volcano plot... there're some genes than seem extremely differentially expressed with a logFC sooo low (-20 ?!).
I don't understand if I missconsidering something or I'm merging these datasets in an incorrect way... what I've done is simply read each matrix counts and concatenate them. Any idea, please?
Not sure what do you mean by "merging multiple RNAseq" but have you considered and checked if there is any batch effect? Do you have some samples shared on those datasets or at least some knowledge of a gene that should remain more or less constant?
Thanks for the quick response. With "merging datasets" I mean combining multiple different experiments performed in different moments and in distinct samples. I've already deleted those posible batch effects so this cause it's, more or less, excluded. And seeing a "constant" gene would be a nice solution but they are also non-significant.
Can you elaborate on that? It is still unclear to me what you did. What is "different moments" and "distinct samples"? How did you correct batches, and how was the batch effect diagnosed if there was one? Could this simply be biological, like genes very active on one but not the other group?
I've got two conditions to compare between three different disease stages. I processed every disease stage in a different moment and, because the PCA shows me there's no difference among them, I'm trying to merging all together.
I identified two clear batch effects: the medium (codified as "lab" in the PCA I show) and the CT from the first retrotranscription in the RNAseq protocol... and I'm correcting them (for CT, as it's numeric, I'm categorising it) for visualization (through
limma::removeBatchEffect
) and for DE analysis (including them within theDESeq2
's design).The two conditions are very similar, so I didn't expect a change so high... moreover, if I analyse each disease stage separately, the logFC never is higher that 5-6 (absolute values). I'm afraid this could be only an artifact. Thanks for your help!
Can you add the DESeq2 code to your post?