Question

deeptools 3.5 plotCorrelation

0

Entering edit mode

3.5 years ago

jjp55 ▴ 20

Hi all,

I recently got some chromatin profiling data and am trying to compare them using deeptools 3.5 mutliBamSummary followed by plotCorrelation

The code I am running is:

multiBamSummary bins --bamfiles *sorted.bam --binSize 10000 -o multiBamSummary.npz
plotCorrelation --corData multiBamSummary.npz --corMethod pearson --whatToPlot heatmap --skipZeros -o pearson_plot.png

My problem is, when I run this, the resulting plot is a heat map, where almost everything has a value between 0.9 and 1. I'm hoping the error is in my code and not my data. Does anyone have any suggestions?

Thank you!

ChIP-Seq sequencing plotCorrelation next-gen deeptools • 2.2k views

ADD COMMENT • link updated 3.5 years ago by Carlo Yague 8.9k • written 3.5 years ago by jjp55 ▴ 20

0

Entering edit mode

Why do you think there is an error ? Correlations values between 0.9 and 1 means that your samples are all highly correlated, which is not necessarily wrong.

ps: there is a typo in --corMethod person

ADD REPLY • link 3.5 years ago by Carlo Yague 8.9k

0

Entering edit mode

I think it is wrong because even my inputs are very highly correlated to my actual targets. When I do peak calling, I still get peaks in the target compared to the input. Additionally, I have many different targets I am comparing and they do not call the same peaks, but still have a very high correlation.

ADD REPLY • link 3.5 years ago by jjp55 ▴ 20

0

Entering edit mode

Ok, I understand why you are concerned now. See my answer below.

ADD REPLY • link 3.5 years ago by Carlo Yague 8.9k

score 1 · Answer 1 · 2021-06-02

Ok, I understand why you are concerned now. Still, I don't think there is necessarily an error. In the code above, you are computing Pearson correlation on 10000 bp windows between condition. It is possible that the signal (the peaks you called) get buried in such big windows so almost everything is evened out between IP and input. So why do you get such high correlation ? A few outliers regions (subtelomeres, repeated regions, centromeres, ...) could have either very high or very low coverage in every condition. Since Pearson correlation is sensitive to outliers, that would lead to high correlation for all samples. You can verify this hypothesis by plotting a scatterplot instead of a heatmap (--whatToPlot scatterplot) and assess if there are outlier windows.

My suggestion would be to try Spearman (--cormethod spearman) correlation instead, which is much more robust to outliers. If this still does not work, lowering the window size to 1000 bp (--binSize 1000) might increase sensitivity at the cost of being slower to compute.