Question

Pearson correlation between chIP-seq replicates is very poor- am I doing something wrong in the normalization?!

0

Entering edit mode

6.6 years ago

Zee_S ▴ 60

Hello everyone!

I have several chIP-seq samples and their respective inputs and I want to find the Pearson correlation between different chIp replicates.

I have read many posts and documentation about using deeptools, bedtools etc..for this purpose but for some reason, my correlation values are way lower than I expected! Just looking at the enrichment profiles of two IP replicates in IGV, they seem to be very well overlapped but the r value doesn't suggest so.

Im wandering if I'm doing the calculation correctly...or if this is just how the replicates are..

I would be grateful to receive your input on the step by step of how to calculate the r values, either with deep tools or with bedtools or another tool. This would help me very much to validate my method and be assured that the values I get are not incorrect due to an incorrect normalization.

Many thanks!

correlation ChIP-Seq pearson • 2.9k views

ADD COMMENT • link updated 6.6 years ago by jomo018 ▴ 730 • written 6.6 years ago by Zee_S ▴ 60

0

Entering edit mode

So which steps are you doing exactly ? I just follow the Galaxy training ChIP-seq tutorial which is rather nice : http://galaxyproject.github.io/training-material/topics/chip-seq/tutorials/tal1-binding-site-identification/tutorial.html

So far results have been very sensible. However, it could be your replicates are screwed up, because on the wet lab side ChIP-seq is really hard.

ADD REPLY • link 6.6 years ago by colindaven 7.0k

0

Entering edit mode

Hello Colin, Thanks a lot of your reply. I looked at the link you sent me.. thank you! It looks like the tool mentioned here for correlation is the deep tools multibamsummary tool followed by plot correlation tool. I am wandering if this tool normalizes the bam files for sequencing depth or library size? or is it just a raw count? this is actually not mentioned in the documentation.

Ideally I would like to look at the correlation, not between individual bam files, rather between the [IP/Input] enrichment for two replicates. Im not sure mutibamsummary can do this. Any ideas?

ADD REPLY • link 6.6 years ago by Zee_S ▴ 60

1

Entering edit mode

In that case, if multibamsummary can't handle groups of multiple bam files, just combine the IP and input bams into IP.bam and input.bam, then run again. I wouldn't recommend this, replicates are there for a very very good reason.

The docs on multibamsummary are pretty clear

multiBamSummary computes the read coverages for genomic regions for typically two or more BAM files

So these are non-normalized counts

ADD REPLY • link 6.6 years ago by colindaven 7.0k

0

Entering edit mode

thanks, Colin. looks like a good tool to just look at coverage alone. multibam summary can indeed handle multiple bam files, thats not the problem. I just do not comprehend how it would be informative for correlating two (or more) bam files if they are not normalized for read count.

ADD REPLY • link 6.6 years ago by Zee_S ▴ 60

score 1 · Answer 1 · 2018-04-26

Consider the following:

Deeptools bamCompare to create a bedgraph file comparing treated sample against input. The tool supports various normalization options. Just beware, when activating ignoreDuplicates, normalization applies to the reads including duplicates. In that case you need to apply post normalization to the bedgraph files.
R corrplot with bedgraph files as input.

You can test several bin size options and IGV the bedgraph files, prior to corrplot, to get some preliminary feeling of your replicate behavior.