Weird volcano plot
0
0
Entering edit mode
4.0 years ago
garcesj ▴ 50

Hi there,

I'm merging multiple RNAseq datasets and, after differential expression analysis with DESeq2, I got a very weird volcano plot... there're some genes than seem extremely differentially expressed with a logFC sooo low (-20 ?!).

enter image description here

I don't understand if I missconsidering something or I'm merging these datasets in an incorrect way... what I've done is simply read each matrix counts and concatenate them. Any idea, please?

RNA-Seq volcano • 2.2k views
ADD COMMENT
0
Entering edit mode

Not sure what do you mean by "merging multiple RNAseq" but have you considered and checked if there is any batch effect? Do you have some samples shared on those datasets or at least some knowledge of a gene that should remain more or less constant?

ADD REPLY
0
Entering edit mode

Thanks for the quick response. With "merging datasets" I mean combining multiple different experiments performed in different moments and in distinct samples. I've already deleted those posible batch effects so this cause it's, more or less, excluded. And seeing a "constant" gene would be a nice solution but they are also non-significant.

ADD REPLY
0
Entering edit mode

Can you elaborate on that? It is still unclear to me what you did. What is "different moments" and "distinct samples"? How did you correct batches, and how was the batch effect diagnosed if there was one? Could this simply be biological, like genes very active on one but not the other group?

ADD REPLY
0
Entering edit mode

I've got two conditions to compare between three different disease stages. I processed every disease stage in a different moment and, because the PCA shows me there's no difference among them, I'm trying to merging all together.

enter image description here

I identified two clear batch effects: the medium (codified as "lab" in the PCA I show) and the CT from the first retrotranscription in the RNAseq protocol... and I'm correcting them (for CT, as it's numeric, I'm categorising it) for visualization (through limma::removeBatchEffect) and for DE analysis (including them within the DESeq2's design).

The two conditions are very similar, so I didn't expect a change so high... moreover, if I analyse each disease stage separately, the logFC never is higher that 5-6 (absolute values). I'm afraid this could be only an artifact. Thanks for your help!

ADD REPLY
0
Entering edit mode

Can you add the DESeq2 code to your post?

ADD REPLY

Login before adding your answer.

Traffic: 1974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6