Question

How to do correlation analyses between RNAseq and ChIPseq library?

1

Entering edit mode

6.6 years ago

Zee_S ▴ 60

Hello Biostars community,

I would be grateful to get your ideas on a pipeline to correlate the reads of a RNAseq dataset to chipset dataset? I have read many posts where one can calculate the counts per million (CPM) per genomic bin and correlate the two. But I have two questions regarding this approach:

1) How do you account for the large number of 0 reads bins in the RNA-seq sample, arising from the fact that RNA seq enrichment is restricted to only expressed regions of the genome? These zero bins will be included in the correlation calculation.

2) RNA-seq profile is only an enrichment profile (ie. the coverage profile is always >=0). whereas a chipSeq profile is both enrichment as well as depletion. How do you take this factor into account to compare the two and compute the correlation?

Many thanks in advance for your help and suggestions!

RNA-Seq ChIP-Seq correlation normalization • 2.6k views

ADD COMMENT • link updated 6.6 years ago by Hussain Ather ▴ 990 • written 6.6 years ago by Zee_S ▴ 60

1

Entering edit mode

Good point, often zeros are (optionally) excluded by the tool which then creates a matrix and performs the correlation, eg. deeptools.

ADD REPLY • link 6.6 years ago by colindaven 7.0k

0

Entering edit mode

yes, and if you get rid of those zero reads bins, you are essentially getting rid of any potential negative correlation with your chip seq data that you were looking for in the first place. or am I wrong on this point?

ADD REPLY • link 6.6 years ago by Zee_S ▴ 60

score 1 · Answer 1 · 2018-04-27

1

Entering edit mode

6.6 years ago

Hussain Ather ▴ 990

1) You could try normalizing the bins with respect to some larger value.

2) You could try taking the absolute value of ChIP-Seq profile to account for enrichment and depletion.

ADD COMMENT • link 6.6 years ago by Hussain Ather ▴ 990

0

Entering edit mode

Hi Hussain,

Thanks a lot for your reply. Could you kindly elaborate on the two approaches you mention above?

(1) a larger value such as? And how does this solve the issue of zero reads bins in rnaseq dataset being included in the correlation? (2) what do you mean by absolute value?

Thank you

ADD REPLY • link 6.6 years ago by Zee_S ▴ 60