Question

Pearson Correlation test between two Chip-Seq data sets

0

Entering edit mode

9.5 years ago

dally ▴ 210

I was wondering how to run a pearson correlation test between two chip-seq data sets. I have narrowpeak files for both which are essentially bed files.

Is there any bioconductor packages in R that can do this? I've tried a variety of bedtools, and programs such as CHANCE, but they haven't given me what I required.

I used MACS2 to call peaks.

I used Bowtie to align to hg19.

Thanks!

pearsons ChIP-Seq R • 9.7k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.5 years ago by dally ▴ 210

0

Entering edit mode

Do you know what Pearson correlation is? It measures the collinearity between two one dimensional measures.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by karl.stamm 4.1k

0

Entering edit mode

I am only vaguely familiar with the test. I've seen it on a large amount of papers recently while reading up on a new project. Essentially what I am trying to do as quoted from a paper is "find the number of uniquely aligned reads per Pol II gene for two complete biological replicates generated from * insert cell type * cells (Pearson's correlation, R = 0.##)

ADD REPLY • link 9.5 years ago by dally ▴ 210

0

Entering edit mode

There's no test. There's no hypothesis, and there's no p-value. The correlation is just a measurement.

ADD REPLY • link 9.5 years ago by karl.stamm 4.1k

0

Entering edit mode

You can window the data along the genome and calculate a correlation for the values in each window, but you definitely want to filter low windows. This post had some additional ideas.

ADD REPLY • link 9.5 years ago by Madelaine Gogol 5.3k

Ram · Answer 1 · 2015-11-09

3

Entering edit mode

9.5 years ago

Constantine ▴ 290

Try the bamCorrelate function from the deeptools software

ADD COMMENT • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by Constantine ▴ 290

3

Entering edit mode

We have updated to deepTools 2.0 and now bamCorrelate is called multiBamCoverage. The output of that program can be used to compute a correlation (using plotCorrelation) of a PCA (using plotPCA). The new documentation is here:

http://deeptools.readthedocs.org/en/latest/content/tools/multiBamCoverage.html

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.3 years ago by Fidel ★ 2.0k

0

Entering edit mode

Thank you, this was exactly what I needed.

ADD REPLY • link 9.5 years ago by dally ▴ 210

Ram · Answer 2 · 2015-11-06

1

Entering edit mode

9.5 years ago

karl.stamm 4.1k

The "number of uniquely aligned reads per PolII gene" is really just a row of a table of integers.

Sample    Gene1    Gene2    Gene3
Rep1      10       20       30
Rep2      12       18       40

For that table, you can compute the correlation in R like

> cor(c(10,20,30),c(12,18,40))
[1] 0.9496528

and say they are 95% correlated.

So your first step is to generate the table, specifying gene column headers to help you count reads at each site.

ADD COMMENT • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by karl.stamm 4.1k

0

Entering edit mode

This makes much more sense. Is there any software that will generate this table for me? Or is this a sort of "gotta build your own custom script" thing? I do mostly wet lab work, so something with custom scripts is probably out of my ballpark.

ADD REPLY • link 9.5 years ago by dally ▴ 210

0

Entering edit mode

There's a lot of different Biostars questions asked and answered for counting reads. Look for "how to count reads". The most popular tool is probably "htseq-count"

I like coverageBed from bedtools. You give it the BAM and a BED annotating your regions of interest, and it spits out how many reads were in each site.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by karl.stamm 4.1k