RNA-seq using DESeq2: quality control of replicates using PCA and heatmaps
2
0
Entering edit mode
3.8 years ago
Ridha ▴ 130

Hello everyone!

I am using DESeq2 for RNA-seq of 6 samples(3 replicates per condition). I have normalized the counts and plotted them using PCA and correlation heatmap to see whether these replicates are similar before proceeding to differential gene analysis. However, I am not sure whether I should exclude Mock2 replicate (see links below) because it seems that this sample is not like the other two samples in terms of its gene expression. Yet, Mock2 differs from Mock 1 and 3 especially in the 2nd principal component not the first which explains most of the variance. To my simple understanding, this means that It's not a big problem to keep mock2. My questions are:

1) should I keep mock2 or remove it in subsequent analyses as it potentially could skew the results?

2) If I should remove it, then I would end up with only 2 replicates for one condition vs 3 for the other, could that also skew the results?

Thank you very much in advance for your help!

https://ibb.co/fCjKrNq

https://ibb.co/gyh3dKQ

rna-seq R • 1.6k views
ADD COMMENT
0
Entering edit mode

If you complete analysis, how many DEGs show up? Since PC1 is nicely separating the samples by condition, and explains 82% of variance in the data, I suspect you will get a generous number of DEGs if there is an appreciable difference between the samples.

ADD REPLY
0
Entering edit mode

Thank you very much for your help! much appreciated!

ADD REPLY
1
Entering edit mode
3.8 years ago
seidel 11k

Your heat map coloring is set to highlight differences between the samples, but the actual correlations are all still high, and at least Mock2 is not clustering with the Normal samples. It simply looks like a slight outlier. Even with the PCA analysis, it behaves differently than the normal samples, even if it's split from its own group along PC2. In my experience, I wouldn't discard Mock2 looking at these plots. However, I would make pairs plots of the log(CPM) comparing all the samples, just to get a sense of how or where the spread is coming from (a few genes? or an odd shape in the Mock2 data set?). Once you find some DE genes, I would also make bar plots comparing the signals from Mock and Normal to convince yourself that the differences look real and reproducible enough to you.

ADD COMMENT
1
Entering edit mode
3.8 years ago

With only three replicates, you can't omit that sample based solely on your visualizations. Now, if the people who made the library have evidence that that sample didn't behave right, that would allow you to omit it, but for all you know, that variance (and PC2 is only 9%) is real.

ADD COMMENT

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6