Pearson Correlation test between two Chip-Seq data sets
2
0
Entering edit mode
9.0 years ago
dally ▴ 210

I was wondering how to run a pearson correlation test between two chip-seq data sets. I have narrowpeak files for both which are essentially bed files.

Is there any bioconductor packages in R that can do this? I've tried a variety of bedtools, and programs such as CHANCE, but they haven't given me what I required.

I used MACS2 to call peaks.

I used Bowtie to align to hg19.

Thanks!

pearsons ChIP-Seq R • 9.1k views
ADD COMMENT
0
Entering edit mode

Do you know what Pearson correlation is? It measures the collinearity between two one dimensional measures.

ADD REPLY
0
Entering edit mode

I am only vaguely familiar with the test. I've seen it on a large amount of papers recently while reading up on a new project. Essentially what I am trying to do as quoted from a paper is "find the number of uniquely aligned reads per Pol II gene for two complete biological replicates generated from * insert cell type * cells (Pearson's correlation, R = 0.##)

ADD REPLY
0
Entering edit mode

There's no test. There's no hypothesis, and there's no p-value. The correlation is just a measurement.

ADD REPLY
0
Entering edit mode

You can window the data along the genome and calculate a correlation for the values in each window, but you definitely want to filter low windows. This post had some additional ideas.

ADD REPLY
3
Entering edit mode
9.0 years ago
Constantine ▴ 290

Try the bamCorrelate function from the deeptools software

ADD COMMENT
3
Entering edit mode

We have updated to deepTools 2.0 and now bamCorrelate is called multiBamCoverage. The output of that program can be used to compute a correlation (using plotCorrelation) of a PCA (using plotPCA). The new documentation is here:

http://deeptools.readthedocs.org/en/latest/content/tools/multiBamCoverage.html

ADD REPLY
0
Entering edit mode

Thank you, this was exactly what I needed.

ADD REPLY
1
Entering edit mode
9.0 years ago

The "number of uniquely aligned reads per PolII gene" is really just a row of a table of integers.

Sample    Gene1    Gene2    Gene3
Rep1      10       20       30
Rep2      12       18       40

For that table, you can compute the correlation in R like

> cor(c(10,20,30),c(12,18,40))
[1] 0.9496528

and say they are 95% correlated.

So your first step is to generate the table, specifying gene column headers to help you count reads at each site.

ADD COMMENT
0
Entering edit mode

This makes much more sense. Is there any software that will generate this table for me? Or is this a sort of "gotta build your own custom script" thing? I do mostly wet lab work, so something with custom scripts is probably out of my ballpark.

ADD REPLY
0
Entering edit mode

There's a lot of different Biostars questions asked and answered for counting reads. Look for "how to count reads". The most popular tool is probably "htseq-count"

I like coverageBed from bedtools. You give it the BAM and a BED annotating your regions of interest, and it spits out how many reads were in each site.

ADD REPLY

Login before adding your answer.

Traffic: 2541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6