I read a few papers that calculate Pearson correlation coefficient of ChIP-seq replicates in promoter regions. However, even for active marks like H3K4me3, not 100% of peaks are located within promoter regions.
Would it be more accurate to calculate the correlation using the bed file from MACS?
When calculating correlation, what's the best to look at? reads across the genome? reads in the promoter regions or reads in called peak regions? or anything else? Any comments are highly appreciated!!!
Hi Devon, thanks for your helps! I have a follow-up question: suppose a bin contains no reads or few background noise reads in both replicates, then the correlation between the two replicates in this bin will be almost 1. But obviously this is not the perfect correlation we want, and the correlation coefficient will be biased to high value. For some histone mark or TF, which only binds to a small fraction of the genome, will there be lots of such no-reads bins or background bins if using random bins? Do you think this make sense?
This is the reason we typically look at both spearman's and pearson's correlations.
I tried spearman correlation, it indeed gives slightly lower correlation than pearson did. Thank a lot!!!