Hello everyone!
I have several chIP-seq samples and their respective inputs and I want to find the Pearson correlation between different chIp replicates.
I have read many posts and documentation about using deeptools, bedtools etc..for this purpose but for some reason, my correlation values are way lower than I expected! Just looking at the enrichment profiles of two IP replicates in IGV, they seem to be very well overlapped but the r value doesn't suggest so.
Im wandering if I'm doing the calculation correctly...or if this is just how the replicates are..
I would be grateful to receive your input on the step by step of how to calculate the r values, either with deep tools or with bedtools or another tool. This would help me very much to validate my method and be assured that the values I get are not incorrect due to an incorrect normalization.
Many thanks!
So which steps are you doing exactly ? I just follow the Galaxy training ChIP-seq tutorial which is rather nice : http://galaxyproject.github.io/training-material/topics/chip-seq/tutorials/tal1-binding-site-identification/tutorial.html
So far results have been very sensible. However, it could be your replicates are screwed up, because on the wet lab side ChIP-seq is really hard.
Hello Colin, Thanks a lot of your reply. I looked at the link you sent me.. thank you! It looks like the tool mentioned here for correlation is the deep tools multibamsummary tool followed by plot correlation tool. I am wandering if this tool normalizes the bam files for sequencing depth or library size? or is it just a raw count? this is actually not mentioned in the documentation.
Ideally I would like to look at the correlation, not between individual bam files, rather between the [IP/Input] enrichment for two replicates. Im not sure mutibamsummary can do this. Any ideas?
In that case, if multibamsummary can't handle groups of multiple bam files, just combine the IP and input bams into IP.bam and input.bam, then run again. I wouldn't recommend this, replicates are there for a very very good reason.
The docs on multibamsummary are pretty clear
So these are non-normalized counts
thanks, Colin. looks like a good tool to just look at coverage alone. multibam summary can indeed handle multiple bam files, thats not the problem. I just do not comprehend how it would be informative for correlating two (or more) bam files if they are not normalized for read count.