Question

Calculate ChIP-seq correlation in promoter regions

0

Entering edit mode

5.8 years ago

dcheng1 • 0

I read a few papers that calculate Pearson correlation coefficient of ChIP-seq replicates in promoter regions. However, even for active marks like H3K4me3, not 100% of peaks are located within promoter regions.

Would it be more accurate to calculate the correlation using the bed file from MACS?

When calculating correlation, what's the best to look at? reads across the genome? reads in the promoter regions or reads in called peak regions? or anything else? Any comments are highly appreciated!!!

ChIP-Seq sequencing • 1.5k views

ADD COMMENT • link updated 5.8 years ago by Devon Ryan 104k • written 5.8 years ago by dcheng1 • 0

score 1 · Answer 1 · 2019-02-26

1

Entering edit mode

5.8 years ago

Devon Ryan 104k

I would suggest just using random bins in the genome rather than only those in promoters. That will produce less bias and give you a better overall view of how correlated your samples actually are.

ADD COMMENT • link 5.8 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon, thanks for your helps! I have a follow-up question: suppose a bin contains no reads or few background noise reads in both replicates, then the correlation between the two replicates in this bin will be almost 1. But obviously this is not the perfect correlation we want, and the correlation coefficient will be biased to high value. For some histone mark or TF, which only binds to a small fraction of the genome, will there be lots of such no-reads bins or background bins if using random bins? Do you think this make sense?

ADD REPLY • link 5.8 years ago by dcheng1 • 0

0

Entering edit mode

This is the reason we typically look at both spearman's and pearson's correlations.

ADD REPLY • link 5.8 years ago by Devon Ryan 104k

0

Entering edit mode

I tried spearman correlation, it indeed gives slightly lower correlation than pearson did. Thank a lot!!!

ADD REPLY • link 5.8 years ago by dcheng1 • 0